ArtDijk
ArtDijk

Reputation: 2017

How to 'unpack' pandas dataframe column

I have:

import numpy as np
import pandas as pd
data = np.array([['id','Date','Pers', 'Comb'],
                ['Row1','12-12-2016', 'John', [{"name":"asr","value":"no"},{"name":"flt","value":"641"},{"name":"dest","value":"lax"}]],
                ['Row2','24-12-2016', 'Pete', [{"name":"asr","value":"yes"},{"name":"flt","value":"751"},{"name":"dest","value":"nbo"}]],
                ['Row2','25-12-2016', 'Sue', [{"name":"asr","value":"no"},{"name":"flt","value":"810"},{"name":"dest","value":"tyo"}]]])
df_org = (pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))
df_org

        Date        Pers    Comb
Row1    12-12-2016  John    [{u'name': u'asr', u'value': u'no'}, {u'name':...
Row2    24-12-2016  Pete    [{u'name': u'asr', u'value': u'yes'}, {u'name'...
Row2    25-12-2016  Sue [{u'name': u'asr', u'value': u'no'}, {u'name':...

I try to extract the name-values from the Comb column and make each name also a new column. Like this:

data = np.array([['id','Date','Pers', 'asr', 'flt', 'dest'],
                ['Row1','12-12-2016', 'John', "no", "641", "lax"],
                ['Row2','24-12-2016', 'Pete', "yes","751","nbo"],
                ['Row2','25-12-2016', 'Sue', "no","810","tyo"]])
df_new = (pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))
df_new

        Date        Pers    asr flt dest
Row1    12-12-2016  John    no  641 lax
Row2    24-12-2016  Pete    yes 751 nbo
Row2    25-12-2016  Sue no  810 tyo

I tried via 'unpacking' the Comb column:

for i in range(0, len(df)):
    rowdf = pd.DataFrame(json.loads(df.iloc[i]['Comb']))
    print rowdf['name'], rowdf['value']

But then I don't know how to append a new dataframe to the original rows.

Please help to go from df_org to df_new ? Thank you

Upvotes: 2

Views: 3609

Answers (1)

pansen
pansen

Reputation: 6663

Using your dummy data, you can do the following:

def iterate(values):
    return pd.Series({x["name"]: x["value"] for x in values})

pd.concat([df_org, df_org.pop("Comb").apply(iterate)], axis=1)

        Date        Pers    asr     dest    flt
Row1    12-12-2016  John    no      lax     641
Row2    24-12-2016  Pete    yes     nbo     751
Row2    25-12-2016  Sue     no      tyo     810
  • The helper function iterate extracts the values from the dictionaries while returning them as a pandas Series.
  • When using apply with a function that returns Series objects, the result will be casted to a pandas DataFrame.
  • pop returns the given column and removes it from the dataframe.
  • concat finally merges your source df (without the comb column) and the extracted values from Comb

Edit Json Data

In case you have json strings in your comb column, you can convert them into regular python objects via json.loads. Simply change the iterate function to:

import json

def iterate(values):
    return pd.Series({x["name"]: x["value"] for x in json.loads(values)})

Upvotes: 3

Related Questions