alkes
alkes

Reputation: 58

pandas groupby aggregate specific columns by sum and other columns by most common value

I have a dataframe that looks like:

road    dirn    length  lane
1        L         0     2
1        L       0.6     2
1        L       1.2     1
1        L       0.8     2
2        R       1.5     3
2        R       0.4     2
2        R         9     3

I need to aggregate this dataframe where I will groupby the columns 'road' and 'dirn', and sum on column 'length' and get the most common value from column 'lanes'. The resultant dataframe should look like:

road    dirn    length  lanes
1        L       2.6     2
1        L      10.9     3

I can do the groupby and sum easily using groupby, agg and sum. I have seen that most common value can be obtained using value_counts, but I am not sure how to combine both sum and value_counts. The actual dataframe is quite large and I might need to apply the sum and value counts to multiple columns.

Can someone please help me understand: i) how to groupby and get sum for 'length' and most common value for 'lanes' ii) how to expand that to groupby (multiple columns), sum multiple columns), most common value (renaming columns)

Upvotes: 1

Views: 1997

Answers (2)

jezrael
jezrael

Reputation: 862661

Use GroupBy.agg with sum and for most common value is used Series.mode:

df1 = (df.groupby(['road','dirn'], as_index=False)
         .agg({'length':'sum','lane':lambda x: x.mode().iat[0]}))

Similar idea with Series.value_counts:

df1 = (df.groupby(['road','dirn'], as_index=False)
         .agg({'length':'sum','lane':lambda x: x.value_counts().index[0]}))

print (df1)
   road dirn  length  lane
0     1    L     2.6     2
1     2    R    10.9     3

EDIT: If there is list of columns names for both aggregate function is possible use GroupBy.agg with dictionary merged by **:

sumL = ['length', 'accidents']
modeL = ['lane']

func = lambda x: x.value_counts().index[0]

d = {**dict.fromkeys(sumL, 'sum'), **dict.fromkeys(modeL, func)}
print (d)

df1 = df.groupby(['road','dirn'], as_index=False).agg(d)

Upvotes: 1

Furqan Hashim
Furqan Hashim

Reputation: 1318

Code below would give sum of column length and provide mode of column lane

df.groupby(['road','dirn']).agg({'length':[np.sum],'lane':[pd.Series.mode]})

Below code would sum column length and provide max of column lane

df.groupby(['road','dirn']).agg({'length':[np.sum],'lane':[np.max]}

Upvotes: 0

Related Questions