Reputation: 2017
I have:
import numpy as np
import pandas as pd
data = np.array([['id','Date','Pers', 'Comb'],
['Row1','12-12-2016', 'John', [{"name":"asr","value":"no"},{"name":"flt","value":"641"},{"name":"dest","value":"lax"}]],
['Row2','24-12-2016', 'Pete', [{"name":"asr","value":"yes"},{"name":"flt","value":"751"},{"name":"dest","value":"nbo"}]],
['Row2','25-12-2016', 'Sue', [{"name":"asr","value":"no"},{"name":"flt","value":"810"},{"name":"dest","value":"tyo"}]]])
df_org = (pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:]))
df_org
Date Pers Comb
Row1 12-12-2016 John [{u'name': u'asr', u'value': u'no'}, {u'name':...
Row2 24-12-2016 Pete [{u'name': u'asr', u'value': u'yes'}, {u'name'...
Row2 25-12-2016 Sue [{u'name': u'asr', u'value': u'no'}, {u'name':...
I try to extract the name-values from the Comb column and make each name also a new column. Like this:
data = np.array([['id','Date','Pers', 'asr', 'flt', 'dest'],
['Row1','12-12-2016', 'John', "no", "641", "lax"],
['Row2','24-12-2016', 'Pete', "yes","751","nbo"],
['Row2','25-12-2016', 'Sue', "no","810","tyo"]])
df_new = (pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:]))
df_new
Date Pers asr flt dest
Row1 12-12-2016 John no 641 lax
Row2 24-12-2016 Pete yes 751 nbo
Row2 25-12-2016 Sue no 810 tyo
I tried via 'unpacking' the Comb column:
for i in range(0, len(df)):
rowdf = pd.DataFrame(json.loads(df.iloc[i]['Comb']))
print rowdf['name'], rowdf['value']
But then I don't know how to append a new dataframe to the original rows.
Please help to go from df_org to df_new ? Thank you
Upvotes: 2
Views: 3609
Reputation: 6663
Using your dummy data, you can do the following:
def iterate(values):
return pd.Series({x["name"]: x["value"] for x in values})
pd.concat([df_org, df_org.pop("Comb").apply(iterate)], axis=1)
Date Pers asr dest flt
Row1 12-12-2016 John no lax 641
Row2 24-12-2016 Pete yes nbo 751
Row2 25-12-2016 Sue no tyo 810
iterate
extracts the values from the dictionaries while returning them as a pandas Series. apply
with a function that returns Series objects, the result will be casted to a pandas DataFrame. pop
returns the given column and removes it from the dataframe.concat
finally merges your source df (without the comb
column) and the extracted values from Comb
In case you have json strings in your comb
column, you can convert them into regular python objects via json.loads
. Simply change the iterate
function to:
import json
def iterate(values):
return pd.Series({x["name"]: x["value"] for x in json.loads(values)})
Upvotes: 3