Using the GFW API

grabbing forest cover loss information for user supplied polygons

March 21, 2014

Update. We recently added the ability to query multipolygons from within the GFW API. The procedure detailed in this post still works, but a much simpler procedure is described in an update.

Suppose a multinational company is trying to rid their supply chain of materials sourced from primary forest. Consider, for example, Mars and their recent commitment to no deforestation of primary forest or areas of high conservation value. How would an analyst support Mars in this effort? Specifically, how can someone with only moderate programming experience measure forest cover loss within a given region? This post shows how we did this, riding on the open source GFW API.

Here is the code and data for this quick tutorial, if you're interested in following along. If you're not, just grab the code and import the following modules into Python:

import os
import json
import requests

1) Convert polygons to GeoJSON. The polygons may be stored in many different formats, often in a shapefile maybe as a KML. Whatever the starting format, the objective should be the platform agnostic GeoJSON format. We used the wonderful and open Geospatial Data Abstraction Library, and more specifically the ogr2ogr command. But another option is to drag-and-drop your data into CartoDB and export it as GeoJSON. The starting formats can be .csv, .osm, .shp, and many others. We the Data Lab tend to work in the command line, so we used the following command to convert a shapefile (IDN_adm2.shp) of Indonesia's subprovinces downloaded from the Global Administrative Boundaries site:

$ ogr2ogr -f GeoJSON map.geojson IDN_adm2.shp -simplify 0.001

You don't need to simplify the polygons with the simplify option, but I did to make the file slightly more reasonable to work with. The default simplifications will preserve topology, so it won't significantly change the shapes of the stored polygons. No harm, no foul. To install and verify ogr2ogr on Ubuntu run the following commands:

$ sudo add-apt-repository ppa:ubuntugis/ppa && sudo apt-get update
$ sudo apt-get install gdal-bin
$ ogrinfo

2) Read the data into Python. I saved map.geojson into a data/ subdirectory, just to keep track of code and data for this tutorial. But whatever, save it where you please. Read it into Python. Use this code:

def _read_data(data='data/map.json'):
    """Returns a list of JSON dictionaries, one for each feature of
    the polygons.
    """
    with open(data) as json_file:
        x = json.load(json_file)
        return x['features']

3) Prep shapes for API query. A polygon is defined by a sequence of coordinate tuples, latitude and longitude in degrees. The Earth Engine API expects that this sequence forms a ring, with the first and last tuple being the same. The following functions accept a nested list of tuples and converts them into JSON that defines a polygon. Why do we need to do this? Aren't the shapes already in a JSON-like format? Good question. Yes, but for the GFW layers, the Google Earth Engine API does not accept multipolygons -- shapes comprised of multiple polygons. The API only accepts one polygon at a time. In subsequent functions, we will rip apart the multipolygons. These functions help to put it back together. We don't really call these functions directly, but rather use them as components to the final, all-in-one function _process_entry() toward the end of this post. (This may seem silly to Python programmers who are object-oriented; but we come from a Clojure background, a functional language spoken by the cool kids. Not that we're cool, per se, but we intone the dialect.)

def _force_ring(coords):
    """Returns a nested list of coordinates, ensuring that the
    coordinates form a ring with the first and last coordinates the
    same.
    """
    if coords[0] != coords[-1]:
        return [coords + coords[-1]]
    else: 
        return [coords]

def _polygon(coords):
    """Converts a list of coordinates to json"""
    ring =  _force_ring(coords)
    return json.dumps({"type": "Polygon", "coordinates": ring})

4) Hit the API, hard. The next two functions will create a parameter dictionary for a POST request via the GFW API. The default data set that is queried is the University of Maryland Tree Cover Loss and Gain. You can adjust the end-point through the endpoint variable in _grab_loss() to query the FORMA alerts as well, although the supplied parameters from may have to change. We are interested in getting just the forest cover loss for a specific year, relative to the previous years -- for all years after 2001 and before 2013. As soon as the newest UMD data are available (for 2013) we will be able to query that layer as well. No word quite yet on exactly when that will be. The return object is JSON, like this for all of Indonesia. We translate this into a Python dictionary and then grab just the tree cover loss.

def _params(coords, year):
    """Returns a parameter dictionary for a post request"""
    p = _polygon(coords)
    return {"begin":year-1, "end":year, "geom":p}

def _grab_loss(coords, year):
    """Returns the loss in hectares of the supplied coordinates and
    year from the UMD data set hosted on Earth Engine via GFW API.
    """
    endpoint = 'http://gfw-apis.appspot.com/datasets/umd'
    res = requests.post(url, data=_params(endpoint, year))
    return res.json()['loss']

5) Create a time series of loss. The following function accepts the GeoJSON representation of an GADM administrative area from data/map.geojson and returns a list of dictionaries with the tree loss data by year. A time series. Ready for analysis. The administrative area is stored as a GeoJSON multipolygon. The function _process_entry() iterates through each polygon for each year between 2001 and 2012. (Note that the year range in the function goes through 2013, since Python arrays are zero-indexed. Also, we are working to update the GFW call to support multipolygons. We will add an addendum to this tutorial when it's ready; but no matter what, this process will work in perpetuity.) The resulting list of dictionaries is ready for conversion to a Pandas data frame, a convenient format for statistical analysis or persistent storage.

def _process_entry(entry):
    """Converts the supplied entry, a multipolygon in json, into a
    list of dictionaries with loss information associated with year
    and province identifying information.  Ready for quick conversion
    to Pandas data frame.
    """
    name_1 = entry['properties']['NAME_1']
    name_2 = entry['properties']['NAME_2']
    coord_set = entry['geometry']['coordinates']

    res = []
    for yr in range(2001,2013):
       x = [_grab_loss(coord, yr) for coord in coord_set]
       xx = {'prov':name_1, 'subprov':name_2, 'year':yr, 'loss':sum(x)}
       res.append(xx)
    return res

6) Use. Ok, so we have these functions. How do I use them? Well, check it. Suppose I am interested in tracking deforestation for a particular subprovince, say the Sarolangun district (or strictly 'kabupaten') in the Jambi province of Indonesia. I will first filter the result of _read_data(), defined above, so that only the multipolygon representation of the district is returned. Then, I will pass that object to _process_entry() and convert the final result to a Pandas data frame. Then, for the love, I will have a beer. This was hard work. I am drinking a beer now, in fact, so my typing is getting worse and worzsase. But whatever, bosses, it's Friday evening.

def _filter_admin(prov, subprov, data='data/map.geojson'):
    """Accepts a province and subprovince name (strings) and a data
    source in json format and returns the multipolygon associated with
    that administrative unit.
    """
    polys = _read_data(data)

    def _spec_filter(xx):
        x = xx['properties']
        return (x['NAME_1'] == prov) & (x['NAME_2'] == subprov)

    return filter(_spec_filter, polys)

entries = _filter_admin('Jambi', 'Sarolangun')
x = map(_process_entry, entries)

import pandas as pd
print pd.DataFrame(x)

           loss   prov     subprov  year
    6704.959394  Jambi  Sarolangun  2001
   13286.370658  Jambi  Sarolangun  2002
    9332.024050  Jambi  Sarolangun  2003
   10641.634414  Jambi  Sarolangun  2004
   12703.430806  Jambi  Sarolangun  2005
   23783.878311  Jambi  Sarolangun  2006
   31618.787513  Jambi  Sarolangun  2007
   33496.859472  Jambi  Sarolangun  2008
   33783.859539  Jambi  Sarolangun  2009
   19586.615246  Jambi  Sarolangun  2010
   11447.544018  Jambi  Sarolangun  2011
   31439.855632  Jambi  Sarolangun  2012

Yeah, so that's about it. Please e-mail us with any questions at datalab@wri.org. We'd be happy to answer them. And if not, we'd be equally happy to just ignore the e-mail altogether. We're good at that.

Subscribe to this blog

Using the GFW API

grabbing forest cover loss information for user supplied polygons

Share this article with friends