Reputation: 2457
I'm trying to use Pandas to convert two columns into a column that is a dictionary representation of the two converted columns.
df = DataFrame({'Metrics' : [[("P", "P"), ("Q","Q")], ("K", "K"), ("Z", "Z")],
'Stage_Name' : ["P", "K", "Z"],
'Block_Name' : ["A", "B", "A"]})
Essentially I want to merge Metrics
and Stage_Name
:
Into another columns called merged
, and for example, the 1st row would be:
{'P': [('P', 'P'), ('Q', 'Q')]}
I know how to convert one row into a dictionary representation, however, I'm not sure how do I do this to all rows without a for loop:
something = df.iloc[[0]].set_index('Stage_Name')['Metrics'].to_dict()
print something
Output: {'P': [('P', 'P'), ('Q', 'Q')]}
Later I would want to aggregate based on Block_Name
, so for a merged column, the result would be two dictionaries added together for Block_Name
: A
.
{'P': [('P', 'P'), ('Q', 'Q')], 'Z' : [('Z', 'Z')] }
For Stage_Name
and Metrics
, I'll just have it appended to a list, which looks like this:
grouped = df.groupby(df['Block_Name'])
df_2 = grouped.aggregate(lambda x: tuple(x))
Can someone point me to the right direction? Thanks!
Upvotes: 5
Views: 12282
Reputation: 109546
df['Merged'] = [{key: val} for key, val in zip(df.Stage_Name, df.Metrics)]
>>> df
Block_Name Metrics Stage_Name Merged
0 A [(P, P), (Q, Q)] P {u'P': [(u'P', u'P'), (u'Q', u'Q')]}
1 B (K, K) K {u'K': (u'K', u'K')}
2 A (Z, Z) Z {u'Z': (u'Z', u'Z')}
And then your code produces the desired result:
grouped = df.groupby(df['Block_Name'])
df_2 = grouped.aggregate(lambda x: tuple(x))[['Metrics', 'Stage_Name']]
>>> df_2
Metrics Stage_Name
Block_Name
A ([(P, P), (Q, Q)], (Z, Z)) (P, Z)
B ((K, K),) (K,)
Timing:
%timeit df['Merged'] = [{key: val} for key, val in zip(df.Stage_Name, df.Metrics)]
10000 loops, best of 3: 162 µs per loop
%timeit df['merged'] = df.apply(lambda row: {row['Stage_Name']:row['Metrics']}, axis=1)
1000 loops, best of 3: 332 µs per loop
Upvotes: 8
Reputation: 394101
IIUC correctly then you use apply
with a lambda
:
In [19]:
df['merged'] = df.apply(lambda row: {row['Stage_Name']:row['Metrics']}, axis=1)
df
Out[19]:
Block_Name Metrics Stage_Name merged
0 A [(P, P), (Q, Q)] P {'P': [('P', 'P'), ('Q', 'Q')]}
1 B (K, K) K {'K': ('K', 'K')}
2 A (Z, Z) Z {'Z': ('Z', 'Z')}
Upvotes: 6