Pandas grouping and summing just a certain column

Question

below is a minimal example, showing the problem that I am facing. Let our initial state be the following (I only use dictionary for the purpose of demonstration):

A = [{'D': '16.5.2013', 'A':1, 'B': 0.0, 'C': 2}, {'D': '16.5.2013', 'A':1, 'B': 0.0, 'C': 4}, {'D': '16.5.2013', 'A':1, 'B': 0.5, 'C': 7}]
df = pd.DataFrame(A)
>>> df
   A    B  C          D
0  1  0.0  2  16.5.2013
1  1  0.0  4  16.5.2013
2  1  0.5  7  16.5.2013

How do I get from df to df_new which is:

A_new = [{'D': '16.5.2013', 'A':1, 'B': 0.0, 'C': 6}, {'D': '16.5.2013', 'A':1, 'B': 0.5, 'C': 7}]
df_new = pd.DataFrame(A_new)

>>> df_new
   A    B  C          D
0  1  0.0  6  16.5.2013
1  1  0.5  7  16.5.2013

The first and the second rows of the 'C' column are summed, because 'B' is the same for these two rows. The rest is left the same, for instance, column 'A' is not summed, column 'D' is unchanged. How do I do that assuming I only have df and I want to get df_new. I would really like to find some kind of elegant solution if possible.

Thanks in advance.

Woody Pride · Accepted Answer

If A, and D are always equal when grouping by B, then you can can just group by A, B D, and sum C:

df.groupby(['A', 'B', 'D'], as_index = False).agg(sum)

Output:

   A    B          D  C
0  1  0.0  16.5.2013  6
1  1  0.5  16.5.2013  7

Alternatively:

You essentially want to aggregate the data grouped by column 'B'. To aggregate column C you will just use the built in sum function. For the other columns, you basically just want to select a sole value as you believe they are always the same within groups. To do that, just write a very simple function that aggregates those columns simply by taking the first value.

# will take first value of the grouped data
sole_value = lambda x : list(x)[0]

#dictionary that maps columns to aggregation functions
agg_funcs = {'A' : sole_value, 'C' : sum, 'D' : sole_value}

#group and aggregate
df.groupby('B', as_index = False).agg(agg_funcs)

Output:

     B  A  C          D
0  0.0  1  6  16.5.2013
1  0.5  1  7  16.5.2013

Of course you really need to be sure that you have values that are definitely equal in columns A, and D, otherwise you might be preserving the wrong data.

Pandas grouping and summing just a certain column

Answers (2)

Related Questions