Reputation: 81
This is link used: http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00 It has inflation data of 3 countries over a time period. This is my code so far:
import urllib.request, json
import pandas as pd
import requests
from pandas.io.json import json_normalize
url = r"http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_mmor?precision=1&geo=HU&geo=PL&geo=SK&unit=RCH_M&coicop=CP00"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
df2 = pd.DataFrame(pd.json_normalize(data))
print(df)
And this is the output, which is pretty far from what I would need...: I need the HCIP of the 3 countries for each date mentioned in the JSON file.
version label \
0 2.0 HICP (2015 = 100) - monthly data (monthly rate...
href source updated \
0 http://ec.europa.eu/eurostat/wdds/rest/data/v2... Eurostat 2021-08-31
class id size status.0 status.1 ... \
0 dataset [unit, coicop, geo, time] [1, 1, 3, 307] d d ...
dimension.time.category.label.2020M11 dimension.time.category.label.2020M12 \
0 2020M11 2020M12
dimension.time.category.label.2021M01 dimension.time.category.label.2021M02 \
0 2021M01 2021M02
dimension.time.category.label.2021M03 dimension.time.category.label.2021M04 \
0 2021M03 2021M04
dimension.time.category.label.2021M05 dimension.time.category.label.2021M06 \
0 2021M05 2021M06
dimension.time.category.label.2021M07 dimension.time.category.label.2021M08
0 2021M07 2021M08
Any idea what I did wrong? Or how could I import such a format properly?
Upvotes: 0
Views: 194
Reputation: 11405
From what I can understand of your data:
data['id']
specifies the order of dimensionsdata['dimension'][*]['category']
specifies the order of values per dimensiondata['value']
are the raw values.The rest seems to be redundant.
Now the easiest is to extract the values, let’s also make sure the index is correctly sorted:
>>> val = pd.Series(data['value']).rename(index=int).sort_index()
>>> val
0 2.4
1 1.6
2 1.7
3 2.2
4 0.8
...
916 0.2
917 0.6
918 0.4
919 0.5
920 0.2
Length: 921, dtype: float64
Then for each dimension we can construct a simple dataframe from the info, for example for geo
:
>>> pd.DataFrame({key: val for key, val in data['dimension']['geo']['category'].items()})
index label
HU 0 Hungary
PL 1 Poland
SK 2 Slovakia
So sorting by column index
we will have the label
s in desired order, and ordering those according to data['id']
we get:
>>> dimensions = [pd.DataFrame({
... key: val for key, val in data['dimension'][dim]['category'].items()
... }).sort_values('index')['label'].values for dim in data['id']]
>>> dimensions
[array(['Monthly rate of change'], dtype=object), array(['All-items HICP'], dtype=object), array(['Hungary', 'Poland', 'Slovakia'], dtype=object), array(['1996M02', '1996M03', '1996M04', '1996M05', '1996M06', '1996M07',
'1996M08', '1996M09', '1996M10', '1996M11', '1996M12', '1997M01',
'1997M02', '1997M03', '1997M04', '1997M05', '1997M06', '1997M07',
'1997M08', '1997M09', '1997M10', '1997M11', '1997M12', '1998M01',
'1998M02', '1998M03', '1998M04', '1998M05', '1998M06', '1998M07',
'1998M08', '1998M09', '1998M10', '1998M11', '1998M12', '1999M01',
'1999M02', '1999M03', '1999M04', '1999M05', '1999M06', '1999M07',
'1999M08', '1999M09', '1999M10', '1999M11', '1999M12', '2000M01',
'2000M02', '2000M03', '2000M04', '2000M05', '2000M06', '2000M07',
'2000M08', '2000M09', '2000M10', '2000M11', '2000M12', '2001M01',
'2001M02', '2001M03', '2001M04', '2001M05', '2001M06', '2001M07',
'2001M08', '2001M09', '2001M10', '2001M11', '2001M12', '2002M01',
'2002M02', '2002M03', '2002M04', '2002M05', '2002M06', '2002M07',
'2002M08', '2002M09', '2002M10', '2002M11', '2002M12', '2003M01',
'2003M02', '2003M03', '2003M04', '2003M05', '2003M06', '2003M07',
'2003M08', '2003M09', '2003M10', '2003M11', '2003M12', '2004M01',
'2004M02', '2004M03', '2004M04', '2004M05', '2004M06', '2004M07',
'2004M08', '2004M09', '2004M10', '2004M11', '2004M12', '2005M01',
'2005M02', '2005M03', '2005M04', '2005M05', '2005M06', '2005M07',
'2005M08', '2005M09', '2005M10', '2005M11', '2005M12', '2006M01',
'2006M02', '2006M03', '2006M04', '2006M05', '2006M06', '2006M07',
'2006M08', '2006M09', '2006M10', '2006M11', '2006M12', '2007M01',
'2007M02', '2007M03', '2007M04', '2007M05', '2007M06', '2007M07',
'2007M08', '2007M09', '2007M10', '2007M11', '2007M12', '2008M01',
'2008M02', '2008M03', '2008M04', '2008M05', '2008M06', '2008M07',
'2008M08', '2008M09', '2008M10', '2008M11', '2008M12', '2009M01',
'2009M02', '2009M03', '2009M04', '2009M05', '2009M06', '2009M07',
'2009M08', '2009M09', '2009M10', '2009M11', '2009M12', '2010M01',
'2010M02', '2010M03', '2010M04', '2010M05', '2010M06', '2010M07',
'2010M08', '2010M09', '2010M10', '2010M11', '2010M12', '2011M01',
'2011M02', '2011M03', '2011M04', '2011M05', '2011M06', '2011M07',
'2011M08', '2011M09', '2011M10', '2011M11', '2011M12', '2012M01',
'2012M02', '2012M03', '2012M04', '2012M05', '2012M06', '2012M07',
'2012M08', '2012M09', '2012M10', '2012M11', '2012M12', '2013M01',
'2013M02', '2013M03', '2013M04', '2013M05', '2013M06', '2013M07',
'2013M08', '2013M09', '2013M10', '2013M11', '2013M12', '2014M01',
'2014M02', '2014M03', '2014M04', '2014M05', '2014M06', '2014M07',
'2014M08', '2014M09', '2014M10', '2014M11', '2014M12', '2015M01',
'2015M02', '2015M03', '2015M04', '2015M05', '2015M06', '2015M07',
'2015M08', '2015M09', '2015M10', '2015M11', '2015M12', '2016M01',
'2016M02', '2016M03', '2016M04', '2016M05', '2016M06', '2016M07',
'2016M08', '2016M09', '2016M10', '2016M11', '2016M12', '2017M01',
'2017M02', '2017M03', '2017M04', '2017M05', '2017M06', '2017M07',
'2017M08', '2017M09', '2017M10', '2017M11', '2017M12', '2018M01',
'2018M02', '2018M03', '2018M04', '2018M05', '2018M06', '2018M07',
'2018M08', '2018M09', '2018M10', '2018M11', '2018M12', '2019M01',
'2019M02', '2019M03', '2019M04', '2019M05', '2019M06', '2019M07',
'2019M08', '2019M09', '2019M10', '2019M11', '2019M12', '2020M01',
'2020M02', '2020M03', '2020M04', '2020M05', '2020M06', '2020M07',
'2020M08', '2020M09', '2020M10', '2020M11', '2020M12', '2021M01',
'2021M02', '2021M03', '2021M04', '2021M05', '2021M06', '2021M07',
'2021M08'], dtype=object)]
Now simply use this to define the index of your data and you’re done:
>>> val.index = pd.MultiIndex.from_product(dimensions, names=data['id'])
>>> val
unit coicop geo time
Monthly rate of change All-items HICP Hungary 1996M02 2.4
1996M03 1.6
1996M04 1.7
1996M05 2.2
1996M06 0.8
...
Slovakia 2021M04 0.2
2021M05 0.6
2021M06 0.4
2021M07 0.5
2021M08 0.2
Length: 921, dtype: float64
Now with stack
and unstack
you can pivot dimensions as you wish:
>>> val.unstack('geo')
geo Hungary Poland Slovakia
unit coicop time
Monthly rate of change All-items HICP 1996M02 2.4 1.4 0.3
1996M03 1.6 1.6 0.2
1996M04 1.7 2.0 0.3
1996M05 2.2 1.3 0.5
1996M06 0.8 0.9 0.2
... ... ... ...
2021M04 0.8 0.7 0.2
2021M05 0.6 0.4 0.6
2021M06 0.4 0.1 0.4
2021M07 0.6 0.4 0.5
2021M08 0.2 0.3 0.2
[307 rows x 3 columns]
Upvotes: 1