Reputation: 53
I'm pretty new to Python (just migrating from R) and would like to convert a list to a pandas DataFrame. After researching the topic I found a lot of answers but none of which led to the desired result.
The data originates from an API and has the following structure:
[
{
"id": "ID_ONE",
"name": "NAME_ONE",
"source": {
"id": "AB",
"value": "source AB"
},
"topics": [
{
"id": "11",
"value": "topic 11 "
},
{
"id": "12",
"value": "topic 12 "
}
]
},
{
"id": "ID_TWO",
"name": "NAME_TWO",
"source": {
"id": "BC",
"value": "source BC"
},
"topics": [
{
"id": "12",
"value": "topic 12 "
}
]
}
]
After using requests
and json_normalize
, I end up with a nice DataFrame, but 'topics' (being a list of dictionaries) stays a Series of lists.
Do you have any suggestions how to handle this list?
I would also appreciate any comments or advice whether you think that other data structures are more useful to handle such an output in Python (coming from R, I just feel comfortable using DataFrames and lists).
Upvotes: 1
Views: 1575
Reputation: 887
I'll assume you got that far
import pandas as pd
from pandas.io.json import json_normalize
df=json_normalize(CopyPastedFromQuestion)
You can serialise df.topics again in a loop. However, you need to code how your result should look like. A possible solution could be
all_topics=pd.DataFrame()
for i,row in df.iterrows():
try:
topics=json_normalize(df['topics'].values[i])
topics['parent_id']=row['id']
all_topics=all_topics.append(topics)
except:
pass
final=pd.merge(df,all_topics, left_on='id', right_on='parent_id', how='left')
Upvotes: 1