Using the GFW API [update]
grabbing forest loss information, more efficiently
We recently posted an article about collecting information about forest cover loss using the GFW API. There, I said that the GFW API could not support multipolygons. That was true. It is true no longer. We have added support for multipolygons. This greatly simplifies the post, which spent an inordinate amount of time ripping apart multipolygons into their polygon components and then stitching them back together.
The cross-over code didn't change much, specifically reading in the data and filtering it. Suppose we have a map of administrative boundaries of Indonesia from GADM, which we will call map.geojson
and store in the data
subdirectory. We read in the data and set up a function to filter out only those subprovinces that we want. We need to supply a province name and a subprovince name.
import itertools
import requests
import json
import pandas as pd
def _read_data(data='data/map.geojson'):
with open(data) as json_file:
x = json.load(json_file)
return x['features']
def _filter_admin(prov, sub, data='data/map.geojson'):
polys = _read_data(data)
def _spec_filter(xx):
x = xx['properties']
return (x['NAME_1'] == prov) & (x['NAME_2'] == sub)
return filter(_spec_filter, polys)
Once we have the features we want, we just build the request from a parameter dictionary. Specifically, we are collecting information for a particular multipolygon (which subsumes a standard, single-hull polygon) and the difference in forest loss from one year to the previous year.
def _params(geom, year):
x = json.dumps(geom['geometry'])
return {"begin": year-1, "end": year, "geom": x}
The final function is just building a dictionary with the keys as variable names and the values are the associated attributes, making it exceedingly easy to combine the entries into a Pandas data frame.
def _grab_loss(geom, year):
endpoint = 'http://gfw-apis.appspot.com/datasets/umd'
res = requests.post(endpoint, data=_params(geom, year))
return res.json()['loss']
def _process_entry(entry):
n1 = entry['properties']['NAME_1']
n2 = entry['properties']['NAME_2']
def _res_dict(e, y):
loss = _grab_loss(e, y)
return {'prov':n1, 'sub':n2, 'year':y, 'loss':loss}
return [_res_dict(entry, yr) for yr in range(2001,2013)]
Now we just put all the component functions together to generate a results dictionary for each year and each subprovince of interest.
def process_prov(prov_name):
x = map(_process_entry, _filter_admin(prov_name))
#flatten list of dictionaries
data = list(itertools.chain(*x))
return pd.DataFrame(data)
So what does the output look like? Check it:
>>> xx = process_prov("Jambi")
>>> print xx[0:10]
loss prov subprov year
2168.218476 Jambi Batang Hari 2001
6929.433465 Jambi Batang Hari 2002
6954.091027 Jambi Batang Hari 2003
13053.900478 Jambi Batang Hari 2004
24139.994766 Jambi Batang Hari 2005
34024.525262 Jambi Batang Hari 2006
51258.153696 Jambi Batang Hari 2007
51904.049130 Jambi Batang Hari 2008
39439.839620 Jambi Batang Hari 2009
19894.246405 Jambi Batang Hari 2010
Boom.