TomSelleck
TomSelleck

Reputation: 6968

Return largest group from groupby

Is there a simple way to return the group with the most rows by a call to df.groupby(..)?

# Return group most rows e.g.
largest_group = df.groupby("community_area").max()

Upvotes: 0

Views: 773

Answers (5)

Quang Hoang
Quang Hoang

Reputation: 150765

Then it's just:

group_with_max_rows = df["community_area"].mode[0]

# all rows in that group:
df[df['community_area'] == group_with_max_rows]

Upvotes: 4

Sammiti Yadav
Sammiti Yadav

Reputation: 88

data2 = df.groupby('Column').size()
data2[data2==data2.max()]

Upvotes: 0

wwii
wwii

Reputation: 23753

max with key function returning the shape

>>> df
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  2.0  4.0
e  2.0  5.0
>>> gb = df.groupby('one')
>>> key,grp = max(gb,key=lambda x: x[1].shape)
>>> grp
   one  two
b  2.0  2.0
d  2.0  4.0
e  2.0  5.0
>>>

Upvotes: 5

Michel Guimarães
Michel Guimarães

Reputation: 411

Is this a solution for you?

df = pd.DataFrame([['A',1], ['A', 2], ['A', 3], ['B',1], ['B', 2],['C',1]], columns=['letter', 'number'])
df = df.groupby(['letter']).count()
df = df[df['number'] == df['number'].max()]
print(df)

I've groupped using count() and I got the max() value of the groupping

Upvotes: 0

piRSquared
piRSquared

Reputation: 294348

Use groupby, size, and idxmax The point of this is to leverage the groupby object and the things it calculates to reduce calculating more than we need to.

df.groupby('A').pipe(
    lambda g: df.loc[g.groups[g.size().idxmax()]]
)

    A
1   2
5   2
10  2
15  2

Less pipe and more readable

g = df.groupby('A')
k = g.size().idxmax()
i = g.groups[k]

df.loc[i]

Setup

np.random.seed([3, 141592])
df = pd.DataFrame({'A': np.random.randint(10, size=20)})

Upvotes: 2

Related Questions