Reputation: 11596
I have a df, sample data looks like below:
df
user collaborators
A {'first_name' : 'jack', 'last_name' : 'abc', 'id' : '[email protected]'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : '[email protected]'}
B {'first_name' : 'tom', 'last_name' : 'pwc', 'id' : '[email protected]'}
C {'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}, {'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}
I need to get the first_name
, last_name
and id
in individual columns.
Expected Output:
I first tried to remove the square brackets using:
df['collaborators'].str.strip('{}')
df['collaborators'].str.replace('[{}]', '')
But I got only NaN
as the result, not sure why. I thought of exploding the column after converting it to a list and extracting first_name, last_name and id. But all this looks like lot of steps. Could someone please let me know if there's a shorter way.
Updated DF:
I used to_list
to convert the three columns into list:
df['first_name'] = df['first_name'].to_list()
df['last_name'] = df['last_name'].to_list()
df['id'] = df['id'].to_list()
I then tried below code from SO:
df.set_index('collaborators').apply(lambda x:
x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)
But it didn't work for me.
Upvotes: 2
Views: 597
Reputation: 4543
Use:
(df[0]+",").apply(eval).explode().apply(pd.Series)
Output:
Complete code:
data="""{'first_name' : 'jack', 'last_name' : 'abc', 'id' : '[email protected]'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : '[email protected]'}
{'first_name' : 'tom', 'last_name' : 'pwc', 'id' : '[email protected]'}
{'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}, {'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}"""
df = pd.DataFrame([x for x in data.split('\n')])#
(df[0]+",").apply(eval).explode().apply(pd.Series)
Upvotes: 0
Reputation: 863266
Use ast.literal_eval
with add []
for lists of DataFrames, so possible use DataFrame.explode
with json_normalize
and last append to original with remove helper column new
in DataFrame.pop
:
import ast
df['new'] = df['collaborators'].apply(lambda x: ast.literal_eval('[' + x + ']'))
df1 = df.explode('new').reset_index(drop=True)
df = df1.join(pd.json_normalize(df1.pop('new')))
print (df)
user collaborators first_name \
0 A {'first_name' : 'jack', 'last_name' : 'abc', '... jack
1 A {'first_name' : 'jack', 'last_name' : 'abc', '... john
2 B {'first_name' : 'tom', 'last_name' : 'pwc', 'i... tom
3 C {'first_name' : 'jill', 'last_name' : 'def', '... jill
4 C {'first_name' : 'jill', 'last_name' : 'def', '... jill
last_name id
0 abc [email protected]
1 pqr [email protected]
2 pwc [email protected]
3 def [email protected]
4 def [email protected]
Upvotes: 1