Reputation: 3208
assuming I have the following DataFrame:
import pandas as pd
df = pd.DataFrame({'events': [ [{'event_text': 'hello1'}, {'event_text': 'hello2'}],
[{'event_text': 'whats up?'}],
[{'event_text': 'all good'}, {'event_text': 'bye'}] ]})
print(df)
events
0 [{'event_text': 'hello1'}, {'event_text': 'hel...
1 [{'event_text': 'whats up?'}]
2 [{'event_text': 'all good'}, {'event_text': 'b...
I'm trying to extract all texts into a single column like so:
0 hello1
1 hello2
2 whats up?
3 all good
4 bye
I think the solution involves json_normalize. I've tried the following:
from pandas.io.json import json_normalize
df['events'].apply(json_normalize)
But it yielded the following results:
0 event_text
0 hello1
1 hello2
1 event_text
0 whats up?
2 event_text
0 all good
1 bye
any Pythonic way to handle this ?
Upvotes: 3
Views: 1399
Reputation: 863166
Use flattening in list comprehension and get
for select event_text
, pass it to Series
:
s = pd.Series([y.get('event_text') for x in df['events'] for y in x])
print (s)
0 hello1
1 hello2
2 whats up?
3 all good
4 bye
dtype: object
Upvotes: 8