pandas groupby aggregate specific columns by sum and other columns by most common value

Question

I have a dataframe that looks like:

road    dirn    length  lane
1        L         0     2
1        L       0.6     2
1        L       1.2     1
1        L       0.8     2
2        R       1.5     3
2        R       0.4     2
2        R         9     3

I need to aggregate this dataframe where I will groupby the columns 'road' and 'dirn', and sum on column 'length' and get the most common value from column 'lanes'. The resultant dataframe should look like:

road    dirn    length  lanes
1        L       2.6     2
1        L      10.9     3

I can do the groupby and sum easily using groupby, agg and sum. I have seen that most common value can be obtained using value_counts, but I am not sure how to combine both sum and value_counts. The actual dataframe is quite large and I might need to apply the sum and value counts to multiple columns.

Can someone please help me understand: i) how to groupby and get sum for 'length' and most common value for 'lanes' ii) how to expand that to groupby (multiple columns), sum multiple columns), most common value (renaming columns)

jezrael · Accepted Answer

Use GroupBy.agg with sum and for most common value is used Series.mode:

df1 = (df.groupby(['road','dirn'], as_index=False)
         .agg({'length':'sum','lane':lambda x: x.mode().iat[0]}))

Similar idea with Series.value_counts:

df1 = (df.groupby(['road','dirn'], as_index=False)
         .agg({'length':'sum','lane':lambda x: x.value_counts().index[0]}))

print (df1)
   road dirn  length  lane
0     1    L     2.6     2
1     2    R    10.9     3

EDIT: If there is list of columns names for both aggregate function is possible use GroupBy.agg with dictionary merged by **:

sumL = ['length', 'accidents']
modeL = ['lane']

func = lambda x: x.value_counts().index[0]

d = {**dict.fromkeys(sumL, 'sum'), **dict.fromkeys(modeL, func)}
print (d)

df1 = df.groupby(['road','dirn'], as_index=False).agg(d)

pandas groupby aggregate specific columns by sum and other columns by most common value

Answers (2)

Related Questions