Reputation: 2112
I have a dataframe (df_full) like so:
|cust_id|address |store_id|email |sales_channel|category|
-------------------------------------------------------------------
|1234567|123 Main St|10SjtT |[email protected]|ecom |direct |
|4567345|345 Main St|10SjtT |[email protected]|instore |direct |
|1569457|876 Main St|51FstT |[email protected]|ecom |direct |
and I would like to combine the last 4 fields into one metadata field that is a dictionary like so:
|cust_id|address |metadata |
-------------------------------------------------------------------------------------------------------------------
|1234567|123 Main St|{'store_id':'10SjtT', 'email':'[email protected]','sales_channel':'ecom', 'category':'direct'} |
|4567345|345 Main St|{'store_id':'10SjtT', 'email':'[email protected]','sales_channel':'instore', 'category':'direct'}|
|1569457|876 Main St|{'store_id':'51FstT', 'email':'[email protected]','sales_channel':'ecom', 'category':'direct'} |
is that possible? I've seen a few solutions around on stack overflow but none of them address combining more than 2 fields into a dictionary field.
Upvotes: 11
Views: 8524
Reputation: 294228
set_index
df.set_index(['cust_id', 'address']).apply(dict, axis=1).reset_index(name='metadata')
cust_id address metadata
0 1234567 123 Main St {'store_id': '10SjtT', 'email': '[email protected]...
1 4567345 345 Main St {'store_id': '10SjtT', 'email': '[email protected]...
2 1569457 876 Main St {'store_id': '51FstT', 'email': '[email protected]...
dat = [(c, a, dict(zip([*df][2:], m))) for c, a, *m in zip(*map(df.get, df))]
pd.DataFrame(dat, df.index, [*df][:2] + ['metadata'])
cust_id address metadata
0 1234567 123 Main St {'store_id': '10SjtT', 'email': '[email protected]...
1 4567345 345 Main St {'store_id': '10SjtT', 'email': '[email protected]...
2 1569457 876 Main St {'store_id': '51FstT', 'email': '[email protected]...
Upvotes: 2
Reputation: 2643
Use to_dict
,
columns = ['store_id', 'email', 'sales_channel', 'category']
df['metadata'] = df[columns].to_dict(orient='records')
And if you want to drop
original columns,
df = df.drop(columns=columns)
Upvotes: 20