Reputation: 6968
Is there a simple way to return the group with the most rows by a call to df.groupby(..)
?
# Return group most rows e.g.
largest_group = df.groupby("community_area").max()
Upvotes: 0
Views: 773
Reputation: 150765
Then it's just:
group_with_max_rows = df["community_area"].mode[0]
# all rows in that group:
df[df['community_area'] == group_with_max_rows]
Upvotes: 4
Reputation: 23753
max with key function returning the shape
>>> df
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d 2.0 4.0
e 2.0 5.0
>>> gb = df.groupby('one')
>>> key,grp = max(gb,key=lambda x: x[1].shape)
>>> grp
one two
b 2.0 2.0
d 2.0 4.0
e 2.0 5.0
>>>
Upvotes: 5
Reputation: 411
Is this a solution for you?
df = pd.DataFrame([['A',1], ['A', 2], ['A', 3], ['B',1], ['B', 2],['C',1]], columns=['letter', 'number'])
df = df.groupby(['letter']).count()
df = df[df['number'] == df['number'].max()]
print(df)
I've groupped using count()
and I got the max()
value of the groupping
Upvotes: 0
Reputation: 294348
Use groupby
, size
, and idxmax
The point of this is to leverage the groupby
object and the things it calculates to reduce calculating more than we need to.
df.groupby('A').pipe(
lambda g: df.loc[g.groups[g.size().idxmax()]]
)
A
1 2
5 2
10 2
15 2
Less pipe
and more readable
g = df.groupby('A')
k = g.size().idxmax()
i = g.groups[k]
df.loc[i]
np.random.seed([3, 141592])
df = pd.DataFrame({'A': np.random.randint(10, size=20)})
Upvotes: 2