How to merge dictionaries of a pandas dataframe when grouping by rows

Question

I have a dataframe of the form:

id date     area1              area2
01 20181010 {'a': 10, 'b': 15} {'a': 20, 'c': 13}
01 20181010 {'c': 17}          {'b': 12}  
02 20180506 {'a': 2, 'b': 3}   {'c': 4}
02 20180506 Nan                {'a': 18}

I would like to group all rows with matching 'id' and 'date', while merging the dictionaries of 'area1' and 'area2'. Meaning I would like to get:

  id date     area1                       area2
  01 20181010 {'a': 10, 'b': 15, 'c': 17} {'a': 20, 'c': 13, 'b': 12}
  02 20180506 {'a': 2, 'b': 3}            {'c': 4, 'a': 18}

First I was trying something like:

merged_df = df.groupby(["id", "date"],as_index=False).agg({'area1':'first', 'area2': 'first'})

Obviously this only gets the first dict of area1 and area2. But if I understand correctly it is possible to pass a function to agg, so would it be possible to merge the dictionaries like that? I just do not get the way to tell it to take the next dict and merge it (taking into account that it might not exists and be a Nan).

Thanks a lot!

Ahh also would it be great if the solution is not super slow since I have to do it for a large dataset :/

jpp · Accepted Answer

You are nearly there. You just need to use a custom function which merges dictionaries across non-null series values:

def merge_dicts(x):
    return {k: v for d in x.dropna() for k, v in d.items()}

res = df.groupby(['id', 'date'], as_index=False).agg(merge_dicts)

print(res)

   id      date                        area1                        area2
0  01  20181010  {'a': 10, 'b': 15, 'c': 17}  {'a': 20, 'c': 13, 'b': 12}
1  02  20180506             {'a': 2, 'b': 3}            {'c': 4, 'a': 18}

How to merge dictionaries of a pandas dataframe when grouping by rows

Answers (1)

Related Questions