Reputation: 131
I am working with Python 3.7.0, and I am currently facing a problem that I cannot find a solution. Consider the following single entry of data from an API:
data = {'publications': [{'title': 'The effect of land‐use changes on the hydrological behaviour of Histic Andosols in south Ecuador',
'author_affiliations': [[{'first_name': 'W.',
'last_name': 'Buytaert',
'researcher_id': 'ur.01136506420.02',
'affiliations': [{'id': 'grid.442123.2',
'name': 'University of Cuenca',
'org_types': ['Education'],
'city': 'Cuenca',
'city_id': 3658666,
'country': 'Ecuador',
'country_code': 'EC',
'state': None,
'state_code': None},
{'id': 'grid.5596.f',
'name': 'KU Leuven',
'org_types': ['Education'],
'city': 'Leuven',
'city_id': 2792482,
'country': 'Belgium',
'country_code': 'BE',
'state': None,
'state_code': None}]},
{'first_name': 'G.',
'last_name': 'Wyseure',
'researcher_id': 'ur.012246446667.91',
'affiliations': [{'id': 'grid.5596.f',
'name': 'KU Leuven',
'org_types': ['Education'],
'city': 'Leuven',
'city_id': 2792482,
'country': 'Belgium',
'country_code': 'BE',
'state': None,
'state_code': None}]},
{'first_name': 'B.',
'last_name': 'De Bièvre',
'researcher_id': 'ur.013305075217.11',
'affiliations': [{'id': 'grid.442123.2',
'name': 'University of Cuenca',
'org_types': ['Education'],
'city': 'Cuenca',
'city_id': 3658666,
'country': 'Ecuador',
'country_code': 'EC',
'state': None,
'state_code': None}]},
{'first_name': 'J.',
'last_name': 'Deckers',
'researcher_id': 'ur.0761456127.40',
'affiliations': [{'id': 'grid.5596.f',
'name': 'KU Leuven',
'org_types': ['Education'],
'city': 'Leuven',
'city_id': 2792482,
'country': 'Belgium',
'country_code': 'BE',
'state': None,
'state_code': None}]}]],
'FOR': [{'id': '2539',
'name': '0406 Physical Geography and Environmental Geoscience'}],
'issn': ['0885-6087', '1099-1085'],
'journal': {'id': 'jour.1043737', 'title': 'Hydrological Processes'},
'type': 'article',
'research_org_country_names': ['Belgium', 'Ecuador'],
'doi': '10.1002/hyp.5867',
'year': 2005,
'times_cited': 72}],
'_stats': {'total_count': 957, 'limit': 1, 'offset': 0}}
My aim is to construct a data frame where the nested dictionaries end up combined (separated by commas), and in other cases, with more complicated combinations. For giving an idea, what I am looking for, is something with the following structure:
In the case of the 'author_affiliations' column, this is the trickiest one. Taking into account the entry that I put above, for the first author, this shall be like 'W. Buytaert (University of Cuenca, Ecuador; KU Leuven, Belgium)' and so on...
So far, my attempts have miserably failed. The nearest I got, has been with this very naive code:
from pandas.io.json import json_normalize
data = data['publications']
df = json_normalize(data)
I know that there are a lot of questions alike to mine. However, I haven't found something similar (or at least I haven't noticed that easily). I appreciate your comments and help.
EDIT
As suggested in a comment, I put the desired output as text:
FOR | author_affiliations |doi | issn | journal.id | journal.title | countries | times_cited | title | type | year
0406 Physical... | W. Buytaert (University of Cuenca, Ecuador;KU Leuven, Belgium), ... | 10.1002/hyp.5867| 0885-6087,1099-1085 | jour.1043737 | Hydrological Processes | Belgium,Ecuador | 72 | The effect ... | article | 2005
Upvotes: 2
Views: 728
Reputation: 71580
Try using nested_to_record
, then convert to pandas data-frame, then manually change the columns:
from pandas.io import json
data = data['publications']
df = json.nested_to_record(data)
df=pd.DataFrame(df)
df['FOR']=df['FOR'].tolist()[0][0]['name']
df['author_affiliations']=','.join([i[0]['first_name']+i[0]['last_name']+' ('+i[0]['affiliations'][0]['name']+', '+i[0]['affiliations'][0]['country']+';'+i[0]['affiliations'][1]['name']+', '+i[0]['affiliations'][1]['country'] for i in df['author_affiliations'][0]])
df['issn']=','.join(df['issn'][0])
df['research_org_country_names']=','.join(df['research_org_country_names'][0])
And now:
print(df)
Is (shown as an image, jupyter notebook result, because too big for my idle):
Note: json.nested_to_record
produces error, do json.json_normalize
instead
Upvotes: 1