Reputation: 21
I try to flatten this JSON Message to a Pandas Dataframe. The JSON has a field text containing the information I need as separate columns. The first text.text is the username, the second text.text is the item_name and the third text.text is the location I'm interested in.
Working Example:
import pandas as pd
data = '''
[{
"id": 735221,
"type": "message",
"date": "2021-01-01T00:03:58",
"from": "Lost Relics Overseer",
"from_id": "channel1462123539",
"text": [
"💎 ",
{
"type": "bold",
"text": "wakaiacerse1"
},
" located ",
{
"type": "bold",
"text": "Emenire Bamboo (R)"
},
" at ",
{
"type": "bold",
"text": "Ash Woods"
},
"!"
]
}, {
"id": 735222,
"type": "message",
"date": "2021-01-01T00:05:40",
"from": "Lost Relics Overseer",
"from_id": "channel1462123539",
"text": [
"💎 ",
{
"type": "bold",
"text": "oH3CKZ"
},
" stumbled across ",
{
"type": "bold",
"text": "Messantic Crab (R)"
},
" at ",
{
"type": "bold",
"text": "Trayl Woods"
},
"!"
]
}, {
"id": 735223,
"type": "message",
"date": "2021-01-01T00:09:58",
"from": "Lost Relics Overseer",
"from_id": "channel1462123539",
"text": [
"💎 ",
{
"type": "bold",
"text": "Crypton"
},
" stumbled across ",
{
"type": "bold",
"text": "Turquoise Teardrop (R)"
},
" at ",
{
"type": "bold",
"text": "Odenspire Sands"
},
"!"
]
}, {
"id": 735239,
"type": "message",
"date": "2021-01-01T00:37:27",
"from": "Lost Relics Overseer",
"from_id": "channel1462123539",
"text": [
"💎 ",
{
"type": "bold",
"text": "Devotion"
},
" located ",
{
"type": "bold",
"text": "Emenire Bamboo (R)"
},
" at ",
{
"type": "bold",
"text": "Ash Woods"
},
"!"
]
}]
'''
df = pd.read_json(data)
print(df)
How can I extract the information from 'text' to separate columns for my pandas dataframe?
Upvotes: 0
Views: 102
Reputation: 24314
try explode()
+map()
+groupby()
:
df['text']=(df['text'].explode()
.map(lambda x:x.get('text') if isinstance(x,dict) else float('NaN'))
.dropna().groupby(level=0).agg(','.join))
Update:
If you want 3 seperate columns then instead of aggregrating join
aggregrate list
and if needed unique values then agg set
:
s=(df['text'].explode()
.map(lambda x:x.get('text') if isinstance(x,dict) else float('NaN'))
.dropna().groupby(level=0).agg(list))
#Finally make a dataframe out of it and join it to your initial DateFrame:
df=df.join(pd.DataFrame(s.to_list()).rename(columns=lambda x:f"text{x+1}"))
Upvotes: 1