Shabari nath k
Shabari nath k

Reputation: 890

How to explode/split a nested list, inside a list inside a pandas dataframe column and make separate columns out of them?

I have a dataframe. I want to split the Options column into id, AUD,ud.

id col1 col2  Options
1  A    B   [{'id':25,'X': {'AUD': None, 'ud':0}}] 
2  C    D   [{'id':27,'X': {'AUD': None, 'ud':0}}] 
3  E    F   [{'id':28,'X': {'AUD': None, 'ud':0}}] 
4  G    H   [{'id':29,'X': {'AUD': None, 'ud':0}}] 

Expected output dataframe:

id col1 col2 id   Aud   ud
1  A      B   25  None  0
2  C      D   27  None  0
3  E      F   28  None  0
4  G      H   29  None  0

How do you go about it using python3.6 and pandas dataframe?

Upvotes: 2

Views: 790

Answers (2)

jezrael
jezrael

Reputation: 863611

Use list comprehension with json_normalize for get DataFrames and join together by concat, also added DataFrame.add_prefix for avoid duplicated columns names:

from pandas.io.json import json_normalize
import ast

L = [json_normalize(x) for x in df.pop('Options')]
#if strings instead dicts
#L = [json_normalize(ast.literal_eval(x)) for x in df.pop('Options')]

df = df.join(pd.concat(L, ignore_index=True, sort=False).add_prefix('opt_'))
print (df)
   id col1 col2  opt_id opt_X.AUD  opt_X.ud
0   1    A    B      25      None         0
1   2    C    D      27      None         0
2   3    E    F      28      None         0
3   4    G    H      29      None         0

Another solution with extract X values of nested dictionaries:

L = [{k: v for y in ast.literal_eval(x) for k, v in {**y.pop('X'), **y}.items()} 
           for x in df.pop('Options')]

df = df.join(pd.DataFrame(L, index=df.index).add_prefix('opt_'))
print (df)
   id col1 col2 opt_AUD  opt_ud  opt_id
0   1    A    B    None       0      25
1   2    C    D    None       0      27
2   3    E    F    None       0      28
3   4    G    H    None       0      29

Upvotes: 6

Mark
Mark

Reputation: 984

Try this:

for dit in df['Options'].iteritems():
    df.loc[dit[0],'id'] = dit[1][0]['id']
    df.loc[dit[0],'Aud'] = dit[1][0]['X']['AUD']
    df.loc[dit[0],'ud'] = dit[1][0]['X']['ud']

Upvotes: 1

Related Questions