Using the GFW API
grabbing forest cover loss information for user supplied polygons
Update. We recently added the ability to query multipolygons from within the GFW API. The procedure detailed in this post still works, but a much simpler procedure is described in an update.
Suppose a multinational company is trying to rid their supply chain of materials sourced from primary forest. Consider, for example, Mars and their recent commitment to no deforestation of primary forest or areas of high conservation value. How would an analyst support Mars in this effort? Specifically, how can someone with only moderate programming experience measure forest cover loss within a given region? This post shows how we did this, riding on the open source GFW API.
Here is the code and data for this quick tutorial, if you're interested in following along. If you're not, just grab the code and import the following modules into Python:
import os
import json
import requests
1) Convert polygons to GeoJSON. The polygons may be stored in many different formats, often in a shapefile maybe as a KML. Whatever the starting format, the objective should be the platform agnostic GeoJSON format. We used the wonderful and open Geospatial Data Abstraction Library, and more specifically the ogr2ogr
command. But another option is to drag-and-drop your data into CartoDB and export it as GeoJSON. The starting formats can be .csv
, .osm
, .shp
, and many others. We the Data Lab tend to work in the command line, so we used the following command to convert a shapefile (IDN_adm2.shp) of Indonesia's subprovinces downloaded from the Global Administrative Boundaries site:
$ ogr2ogr -f GeoJSON map.geojson IDN_adm2.shp -simplify 0.001
You don't need to simplify the polygons with the simplify
option, but I did to make the file slightly more reasonable to work with. The default simplifications will preserve topology, so it won't significantly change the shapes of the stored polygons. No harm, no foul. To install and verify ogr2ogr
on Ubuntu run the following commands:
$ sudo add-apt-repository ppa:ubuntugis/ppa && sudo apt-get update
$ sudo apt-get install gdal-bin
$ ogrinfo
2) Read the data into Python. I saved map.geojson
into a data/
subdirectory, just to keep track of code and data for this tutorial. But whatever, save it where you please. Read it into Python. Use this code:
def _read_data(data='data/map.json'):
"""Returns a list of JSON dictionaries, one for each feature of
the polygons.
"""
with open(data) as json_file:
x = json.load(json_file)
return x['features']
3) Prep shapes for API query. A polygon is defined by a sequence of coordinate tuples, latitude and longitude in degrees. The Earth Engine API expects that this sequence forms a ring, with the first and last tuple being the same. The following functions accept a nested list of tuples and converts them into JSON that defines a polygon. Why do we need to do this? Aren't the shapes already in a JSON-like format? Good question. Yes, but for the GFW layers, the Google Earth Engine API does not accept multipolygons -- shapes comprised of multiple polygons. The API only accepts one polygon at a time. In subsequent functions, we will rip apart the multipolygons. These functions help to put it back together. We don't really call these functions directly, but rather use them as components to the final, all-in-one function _process_entry()
toward the end of this post. (This may seem silly to Python programmers who are object-oriented; but we come from a Clojure background, a functional language spoken by the cool kids. Not that we're cool, per se, but we intone the dialect.)
def _force_ring(coords):
"""Returns a nested list of coordinates, ensuring that the
coordinates form a ring with the first and last coordinates the
same.
"""
if coords[0] != coords[-1]:
return [coords + coords[-1]]
else:
return [coords]
def _polygon(coords):
"""Converts a list of coordinates to json"""
ring = _force_ring(coords)
return json.dumps({"type": "Polygon", "coordinates": ring})
4) Hit the API, hard. The next two functions will create a parameter dictionary for a POST request via the GFW API. The default data set that is queried is the University of Maryland Tree Cover Loss and Gain. You can adjust the end-point through the endpoint
variable in _grab_loss()
to query the FORMA alerts as well, although the supplied parameters from may have to change. We are interested in getting just the forest cover loss for a specific year, relative to the previous years -- for all years after 2001 and before 2013. As soon as the newest UMD data are available (for 2013) we will be able to query that layer as well. No word quite yet on exactly when that will be. The return object is JSON, like this for all of Indonesia. We translate this into a Python dictionary and then grab just the tree cover loss.
def _params(coords, year):
"""Returns a parameter dictionary for a post request"""
p = _polygon(coords)
return {"begin":year-1, "end":year, "geom":p}
def _grab_loss(coords, year):
"""Returns the loss in hectares of the supplied coordinates and
year from the UMD data set hosted on Earth Engine via GFW API.
"""
endpoint = 'http://gfw-apis.appspot.com/datasets/umd'
res = requests.post(url, data=_params(endpoint, year))
return res.json()['loss']
5) Create a time series of loss. The following function accepts the GeoJSON representation of an GADM administrative area from data/map.geojson
and returns a list of dictionaries with the tree loss data by year. A time series. Ready for analysis. The administrative area is stored as a GeoJSON multipolygon. The function _process_entry()
iterates through each polygon for each year between 2001 and 2012. (Note that the year range in the function goes through 2013, since Python arrays are zero-indexed. Also, we are working to update the GFW call to support multipolygons. We will add an addendum to this tutorial when it's ready; but no matter what, this process will work in perpetuity.) The resulting list of dictionaries is ready for conversion to a Pandas data frame, a convenient format for statistical analysis or persistent storage.
def _process_entry(entry):
"""Converts the supplied entry, a multipolygon in json, into a
list of dictionaries with loss information associated with year
and province identifying information. Ready for quick conversion
to Pandas data frame.
"""
name_1 = entry['properties']['NAME_1']
name_2 = entry['properties']['NAME_2']
coord_set = entry['geometry']['coordinates']
res = []
for yr in range(2001,2013):
x = [_grab_loss(coord, yr) for coord in coord_set]
xx = {'prov':name_1, 'subprov':name_2, 'year':yr, 'loss':sum(x)}
res.append(xx)
return res
6) Use. Ok, so we have these functions. How do I use them? Well, check it. Suppose I am interested in tracking deforestation for a particular subprovince, say the Sarolangun district (or strictly 'kabupaten') in the Jambi province of Indonesia. I will first filter the result of _read_data()
, defined above, so that only the multipolygon representation of the district is returned. Then, I will pass that object to _process_entry()
and convert the final result to a Pandas data frame. Then, for the love, I will have a beer. This was hard work. I am drinking a beer now, in fact, so my typing is getting worse and worzsase. But whatever, bosses, it's Friday evening.
def _filter_admin(prov, subprov, data='data/map.geojson'):
"""Accepts a province and subprovince name (strings) and a data
source in json format and returns the multipolygon associated with
that administrative unit.
"""
polys = _read_data(data)
def _spec_filter(xx):
x = xx['properties']
return (x['NAME_1'] == prov) & (x['NAME_2'] == subprov)
return filter(_spec_filter, polys)
entries = _filter_admin('Jambi', 'Sarolangun')
x = map(_process_entry, entries)
import pandas as pd
print pd.DataFrame(x)
loss prov subprov year
6704.959394 Jambi Sarolangun 2001
13286.370658 Jambi Sarolangun 2002
9332.024050 Jambi Sarolangun 2003
10641.634414 Jambi Sarolangun 2004
12703.430806 Jambi Sarolangun 2005
23783.878311 Jambi Sarolangun 2006
31618.787513 Jambi Sarolangun 2007
33496.859472 Jambi Sarolangun 2008
33783.859539 Jambi Sarolangun 2009
19586.615246 Jambi Sarolangun 2010
11447.544018 Jambi Sarolangun 2011
31439.855632 Jambi Sarolangun 2012
Yeah, so that's about it. Please e-mail us with any questions at datalab@wri.org. We'd be happy to answer them. And if not, we'd be equally happy to just ignore the e-mail altogether. We're good at that.