Roger Bücker
Roger Bücker

Reputation: 21

How to flatten this JSON to a Pandas Dataframe

I try to flatten this JSON Message to a Pandas Dataframe. The JSON has a field text containing the information I need as separate columns. The first text.text is the username, the second text.text is the item_name and the third text.text is the location I'm interested in.

Working Example:

import pandas as pd

data = '''
          [{
    "id": 735221,
    "type": "message",
    "date": "2021-01-01T00:03:58",
    "from": "Lost Relics Overseer",
    "from_id": "channel1462123539",
    "text": [
        "💎 ",
        {
            "type": "bold",
            "text": "wakaiacerse1"
        },
        " located ",
        {
            "type": "bold",
            "text": "Emenire Bamboo (R)"
        },
        " at ",
        {
            "type": "bold",
            "text": "Ash Woods"
        },
        "!"
    ]
}, {
    "id": 735222,
    "type": "message",
    "date": "2021-01-01T00:05:40",
    "from": "Lost Relics Overseer",
    "from_id": "channel1462123539",
    "text": [
        "💎 ",
        {
            "type": "bold",
            "text": "oH3CKZ"
        },
        " stumbled across ",
        {
            "type": "bold",
            "text": "Messantic Crab (R)"
        },
        " at ",
        {
            "type": "bold",
            "text": "Trayl Woods"
        },
        "!"
    ]
}, {
    "id": 735223,
    "type": "message",
    "date": "2021-01-01T00:09:58",
    "from": "Lost Relics Overseer",
    "from_id": "channel1462123539",
    "text": [
        "💎 ",
        {
            "type": "bold",
            "text": "Crypton"
        },
        " stumbled across ",
        {
            "type": "bold",
            "text": "Turquoise Teardrop (R)"
        },
        " at ",
        {
            "type": "bold",
            "text": "Odenspire Sands"
        },
        "!"
    ]
}, {
    "id": 735239,
    "type": "message",
    "date": "2021-01-01T00:37:27",
    "from": "Lost Relics Overseer",
    "from_id": "channel1462123539",
    "text": [
        "💎 ",
        {
            "type": "bold",
            "text": "Devotion"
        },
        " located ",
        {
            "type": "bold",
            "text": "Emenire Bamboo (R)"
        },
        " at ",
        {
            "type": "bold",
            "text": "Ash Woods"
        },
        "!"
    ]
}]
'''

df = pd.read_json(data)
print(df)

How can I extract the information from 'text' to separate columns for my pandas dataframe?

Upvotes: 0

Views: 102

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24314

try explode()+map()+groupby():

df['text']=(df['text'].explode()
                      .map(lambda x:x.get('text') if isinstance(x,dict) else float('NaN'))
                      .dropna().groupby(level=0).agg(','.join))

Update:

If you want 3 seperate columns then instead of aggregrating join aggregrate list and if needed unique values then agg set:

s=(df['text'].explode()
             .map(lambda x:x.get('text') if isinstance(x,dict) else float('NaN'))
             .dropna().groupby(level=0).agg(list))

#Finally make a dataframe out of it and join it to your initial DateFrame:

df=df.join(pd.DataFrame(s.to_list()).rename(columns=lambda x:f"text{x+1}"))

Upvotes: 1

Related Questions