jessirocha
jessirocha

Reputation: 537

How to split a list of dictionaries into multiple columns keeping the same index?

I have a data frame that has as index a timestamp and a column that has list of dictionaries:

    index                   var_A

    2019-08-21 09:05:49    [{"Date1": "Aug 21, 2017 9:09:51 AM","Date2": "Aug 21, 2017 9:09:54 AM","Id": "d5e665e5","num_ins": 108,"num_del": 0, "time": 356} , {"Date1": "Aug 21, 2017 9:09:57 AM","Date2": "Aug 21, 2017 9:09:59 AM","Id": "d5e665e5","num_ins": 218,"num_del": 5, "time": 166}]
    2019-08-21 09:05:59    [{"Date1": "Aug 21, 2017 9:10:01 AM","Date2": "Aug 21, 2017 9:11:54 AM","Id": "d5e665e5","num_ins": 348,"num_del": 72, "time": 3356} , {"Date1": "Aug 21, 2017 9:19:57 AM","Date2": "Aug 21, 2017 9:19:59 AM","Id": "d5e665e5","num_ins": 69,"num_del": 5, "time": 125}, {"Date1": "Aug 21, 2017 9:20:01 AM","Date2": "Aug 21, 2017 9:21:54 AM","Id": "f9e775f9","num_ins": 470,"num_del": 0, "time": 290} ]
    2019-08-21 09:06:04    []

What I wish to achieve is a dataframe like:

    index              Date1                      Date2                    Id      num_ins       num_del    time
2019-08-21 09:05:49   Aug 21, 2017 9:09:51AM   Aug 21, 2017 9:09:54AM   d5e665e5      0           108        356
2019-08-21 09:05:49   Aug 21, 2017 9:09:57AM   Aug 21, 2017 9:09:59AM   d5e665e5      218           5        166
2019-08-21 09:05:59   Aug 21, 2017 9:10:01AM   Aug 21, 2017 9:11:54AM   d5e665e5      348          72       3356
2019-08-21 09:05:59   Aug 21, 2017 9:19:57AM   Aug 21, 2017 9:19:59AM   d5e665e5      69            5        125
2019-08-21 09:05:59   Aug 21, 2017 9:20:01AM   Aug 21, 2017 9:21:54AM   f9e775f9      470           0        290
2019-08-21 09:06:04     NAN                         NAN                    NAN        NAN         NAN        NAN

Upvotes: 2

Views: 1760

Answers (2)

jezrael
jezrael

Reputation: 863631

Loop by each value with enumerate, because duplicated inex values and create DataFrames, then create DataFrame for empty lists and last concat together:

import ast

out = {}
for i, (k, v) in enumerate(df['var_A'].items()):
    df = pd.DataFrame(v)
    if df.empty:
        out[(i, k)] = pd.DataFrame(index=[0], columns=['Id'])
    else:
        out[(i, k)] = df

df = pd.concat(out, sort=True).reset_index(level=[0,2], drop=True)
print (df)
                                       Date1                    Date2  \
2019-08-21 09:05:49  Aug 21, 2017 9:09:51 AM  Aug 21, 2017 9:09:54 AM   
2019-08-21 09:05:49  Aug 21, 2017 9:09:57 AM  Aug 21, 2017 9:09:59 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:10:01 AM  Aug 21, 2017 9:11:54 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:19:57 AM  Aug 21, 2017 9:19:59 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:20:01 AM  Aug 21, 2017 9:21:54 AM   
2019-08-21 09:05:59                      NaN                      NaN   

                           Id  num_del  num_ins    time  
2019-08-21 09:05:49  d5e665e5      0.0    108.0   356.0  
2019-08-21 09:05:49  d5e665e5      5.0    218.0   166.0  
2019-08-21 09:05:59  d5e665e5     72.0    348.0  3356.0  
2019-08-21 09:05:59  d5e665e5      5.0     69.0   125.0  
2019-08-21 09:05:59  f9e775f9      0.0    470.0   290.0  
2019-08-21 09:05:59       NaN      NaN      NaN     NaN  

Upvotes: 1

Madhuri Sangaraju
Madhuri Sangaraju

Reputation: 309

You can use pandas functions stack and concat to do this.

  1. First use stack to unlist the list the column var_A
  2. Then use concat to unnest the dictionary and put it into seperate columns

You can use the following code to do the same. Assuming your dictionary to be df.

Unlist:

df = df.apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

enter image description here

Unnest:

df = pd.concat([df.drop('var_A', axis=1), df['var_A'].apply(pd.Series)], axis=1).drop(0,1)

enter image description here

Upvotes: 1

Related Questions