Karthik S
Karthik S

Reputation: 11596

How to expand array type column in Pandas to individual columns

I have a df, sample data looks like below:

df

user    collaborators
A       {'first_name' : 'jack', 'last_name' : 'abc', 'id' : '[email protected]'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : '[email protected]'}
B       {'first_name' : 'tom', 'last_name' : 'pwc', 'id' : '[email protected]'}
C       {'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}, {'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}

I need to get the first_name, last_name and id in individual columns. Expected Output:

enter image description here

I first tried to remove the square brackets using:

df['collaborators'].str.strip('{}')
df['collaborators'].str.replace('[{}]', '')

But I got only NaN as the result, not sure why. I thought of exploding the column after converting it to a list and extracting first_name, last_name and id. But all this looks like lot of steps. Could someone please let me know if there's a shorter way.

Updated DF:

enter image description here

I used to_list to convert the three columns into list:

df['first_name'] = df['first_name'].to_list()
df['last_name'] = df['last_name'].to_list()
df['id'] = df['id'].to_list()

I then tried below code from SO:

 df.set_index('collaborators').apply(lambda x:
    x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

But it didn't work for me.

Upvotes: 2

Views: 597

Answers (2)

keramat
keramat

Reputation: 4543

Use:

(df[0]+",").apply(eval).explode().apply(pd.Series)

Output:

enter image description here

Complete code:

data="""{'first_name' : 'jack', 'last_name' : 'abc', 'id' : '[email protected]'},  {'first_name' : 'john', 'last_name' : 'pqr', 'id' : '[email protected]'}
{'first_name' : 'tom', 'last_name' : 'pwc', 'id' : '[email protected]'}
{'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'},  {'first_name' : 'jill', 'last_name' : 'def', 'id' : '[email protected]'}"""
df = pd.DataFrame([x for x in data.split('\n')])#
(df[0]+",").apply(eval).explode().apply(pd.Series)

Upvotes: 0

jezrael
jezrael

Reputation: 863266

Use ast.literal_eval with add [] for lists of DataFrames, so possible use DataFrame.explode with json_normalize and last append to original with remove helper column new in DataFrame.pop:

import ast

df['new'] = df['collaborators'].apply(lambda x: ast.literal_eval('[' + x + ']'))

df1 = df.explode('new').reset_index(drop=True)

df = df1.join(pd.json_normalize(df1.pop('new')))
print (df)

  user                                      collaborators first_name  \
0    A  {'first_name' : 'jack', 'last_name' : 'abc', '...       jack   
1    A  {'first_name' : 'jack', 'last_name' : 'abc', '...       john   
2    B  {'first_name' : 'tom', 'last_name' : 'pwc', 'i...        tom   
3    C  {'first_name' : 'jill', 'last_name' : 'def', '...       jill   
4    C  {'first_name' : 'jill', 'last_name' : 'def', '...       jill   

  last_name             id  
0       abc  [email protected]  
1       pqr  [email protected]  
2       pwc  [email protected]  
3       def  [email protected]  
4       def  [email protected]  

Upvotes: 1

Related Questions