Reputation: 12508
I have dataframe like this
df = pd.DataFrame({
'col1' : ['MM1', 'MM1', 'MM1', 'MM2', 'MM2', 'MM2', 'MM4', 'MM4', 'MM4'],
'col2' : ['S1', 'S1', 'S3', 'S3', 'S4', 'S4', 'S2', 'S2', 'S2'],
'col3' : [1,1,2,3,4,3,2,2,4]
})
This gives me the grouped by result and the count
df.groupby(["col1", "col3"]).size()
This gives me the first result of the maximum
df.groupby(["col1", "col3"]).size().idxmax()
However, I want all maximum occurence. Everything I find on Stackoverflow is, when there is already a column with, i.e. the count. And then there are multiple maximums in the count column. But this is different here. I group by two values and then I want the maximum/minimum values of the result of this grouping. How to do this in pandas?
Here I expect
(M1, 2)
(M2, 4)
(M4, 4)
Edit:
df.groupby(["col1", "col3"]).size()
gives me
col1 col3 <- lets name this column count
MM1 1 2
2 1
MM2 3 2
4 1
MM4 2 2
4 1
Lets name the unnamed columns count
.
How do I get all the maximums in the count column
? Hence, this:
2, 2, 2
How do I get its corresponding values from col1
and col2
? Something like this (column names only for clarity):
col1, col3, count
MM1, 1, 2
MM2 3 2
MM4 2 2
Upvotes: 0
Views: 1139
Reputation: 863741
Yiu can convert second level (col3
) to column. so possible aggregate by GroupBy.agg
with DataFrameGroupBy.idxmax
and GroupBy.max
:
df1 = (df.groupby(["col1", "col3"])
.size()
.reset_index(level=0)
.groupby('col1')
.agg(col3 = (0,'idxmax'), count=(0, 'max')))
print (df1)
col3 count
col1
MM1 1 2
MM2 3 2
MM4 2 2
Upvotes: 1