Reputation: 58
I have a dataframe that looks like:
road dirn length lane
1 L 0 2
1 L 0.6 2
1 L 1.2 1
1 L 0.8 2
2 R 1.5 3
2 R 0.4 2
2 R 9 3
I need to aggregate this dataframe where I will groupby the columns 'road' and 'dirn', and sum on column 'length' and get the most common value from column 'lanes'. The resultant dataframe should look like:
road dirn length lanes
1 L 2.6 2
1 L 10.9 3
I can do the groupby and sum easily using groupby, agg and sum. I have seen that most common value can be obtained using value_counts, but I am not sure how to combine both sum and value_counts. The actual dataframe is quite large and I might need to apply the sum and value counts to multiple columns.
Can someone please help me understand: i) how to groupby and get sum for 'length' and most common value for 'lanes' ii) how to expand that to groupby (multiple columns), sum multiple columns), most common value (renaming columns)
Upvotes: 1
Views: 1997
Reputation: 862661
Use GroupBy.agg
with sum
and for most common value is used Series.mode
:
df1 = (df.groupby(['road','dirn'], as_index=False)
.agg({'length':'sum','lane':lambda x: x.mode().iat[0]}))
Similar idea with Series.value_counts
:
df1 = (df.groupby(['road','dirn'], as_index=False)
.agg({'length':'sum','lane':lambda x: x.value_counts().index[0]}))
print (df1)
road dirn length lane
0 1 L 2.6 2
1 2 R 10.9 3
EDIT: If there is list of columns names for both aggregate function is possible use GroupBy.agg
with dictionary merged by **
:
sumL = ['length', 'accidents']
modeL = ['lane']
func = lambda x: x.value_counts().index[0]
d = {**dict.fromkeys(sumL, 'sum'), **dict.fromkeys(modeL, func)}
print (d)
df1 = df.groupby(['road','dirn'], as_index=False).agg(d)
Upvotes: 1
Reputation: 1318
Code below would give sum of column length
and provide mode
of column lane
df.groupby(['road','dirn']).agg({'length':[np.sum],'lane':[pd.Series.mode]})
Below code would sum column length
and provide max of column lane
df.groupby(['road','dirn']).agg({'length':[np.sum],'lane':[np.max]}
Upvotes: 0