georg23
georg23

Reputation: 53

Convert JSON (including arrays of objects) to pandas DataFrame

I'm pretty new to Python (just migrating from R) and would like to convert a list to a pandas DataFrame. After researching the topic I found a lot of answers but none of which led to the desired result.

The data originates from an API and has the following structure:

[
    {
        "id": "ID_ONE",
        "name": "NAME_ONE",
        "source": {
            "id": "AB",
            "value": "source AB"
        },
        "topics": [
            {
                "id": "11",
                "value": "topic 11 "
            },
            {
                "id": "12",
                "value": "topic 12 "
            }
        ]
    },
    {
        "id": "ID_TWO",
        "name": "NAME_TWO",
        "source": {
            "id": "BC",
            "value": "source BC"
        },
        "topics": [
            {
                "id": "12",
                "value": "topic 12 "
            }
        ]
    }
]

After using requests and json_normalize, I end up with a nice DataFrame, but 'topics' (being a list of dictionaries) stays a Series of lists.

Do you have any suggestions how to handle this list?

I would also appreciate any comments or advice whether you think that other data structures are more useful to handle such an output in Python (coming from R, I just feel comfortable using DataFrames and lists).

Upvotes: 1

Views: 1575

Answers (1)

Marcel Flygare
Marcel Flygare

Reputation: 887

I'll assume you got that far

import pandas as pd
from pandas.io.json import json_normalize
df=json_normalize(CopyPastedFromQuestion)

You can serialise df.topics again in a loop. However, you need to code how your result should look like. A possible solution could be

all_topics=pd.DataFrame()
for i,row in df.iterrows():
    try:
        topics=json_normalize(df['topics'].values[i])
        topics['parent_id']=row['id']
        all_topics=all_topics.append(topics)
    except:
        pass
final=pd.merge(df,all_topics, left_on='id', right_on='parent_id', how='left')

Upvotes: 1

Related Questions