Reputation: 746
My data frame has 'id_one' which can have multiple 'id_twos' for each id. Each id_two also has a number of descriptive characteristics stored in other columns. Here's an example dataset.
d = {'id_one' : pd.Series([123, 123, 123]),
'id_two' : pd.Series([456, 567, 678]),
'descriptor' : pd.Series(['blue','yellow', 'green'])}
df = pd.DataFrame(d)
I need to get my data frame in the form of one row per 'id_one', where in 'col a' I store 'id_one' and in 'col b' I store all the values of 'id_two' as dictionary keys and the corresponding descriptors stored as dictionary values.
Any help would be appreciated, thank you.
Upvotes: 0
Views: 2940
Reputation: 5225
Is this what you're looking for?
df.groupby('id_one').apply(lambda x: dict(zip(x['id_two'], x['descriptor']))).reset_index().rename(columns={"id_one":"col a", 0:"col b"})
# col a col b
# 0 123 {456: u'blue', 678: u'green', 567: u'yellow'}
Upvotes: 3