creating a dictionary from multiple columns in a group by (pandas)

Question

My data frame has 'id_one' which can have multiple 'id_twos' for each id. Each id_two also has a number of descriptive characteristics stored in other columns. Here's an example dataset.

d = {'id_one' : pd.Series([123, 123, 123]),
     'id_two' : pd.Series([456, 567, 678]),
     'descriptor' : pd.Series(['blue','yellow', 'green'])}

df = pd.DataFrame(d)

I need to get my data frame in the form of one row per 'id_one', where in 'col a' I store 'id_one' and in 'col b' I store all the values of 'id_two' as dictionary keys and the corresponding descriptors stored as dictionary values.

Any help would be appreciated, thank you.

cmaher · Accepted Answer

Is this what you're looking for?

df.groupby('id_one').apply(lambda x: dict(zip(x['id_two'], x['descriptor']))).reset_index().rename(columns={"id_one":"col a", 0:"col b"})
#    col a                                          col b
# 0    123  {456: u'blue', 678: u'green', 567: u'yellow'}

creating a dictionary from multiple columns in a group by (pandas)

Answers (1)

Related Questions