Reputation: 19
here is a sample of my dataset:
column1 | column2 |
---|---|
A | apple |
A | banana |
A | pineapple |
B | apple |
B | banana |
C | grape |
C | banana |
C | apple |
C | mandarine |
D | apple |
D | banana |
i want to apply the group by method and list out the most common groups that is present in column 1. For example, my output should produce (apple, banana) as the most common group.
Thank you!
Upvotes: 1
Views: 84
Reputation: 1319
You can use DataFrame.GroupBy method and then, you can use count() method:
df = pd.DataFrame(columns=['column1', 'column2'], data =[['A', 'apple'],
['A', 'banana'],
['A', 'pineapple'],
['B', 'apple'],
['B', 'banana'],
['C', 'grape'],
['C', 'banana'],
['C', 'apple'],
['C', 'mandarine'],
['D', 'apple'],
['D', 'banana']])
df_result = df.groupby('column2').count()
df_result = df_result[df_result['column1'] == max(df_result['column1'])]
Output:
column2 column1
0 apple 4
1 banana 4
Edit: According to the question, you want as output ('apple', 'banana')
. So, continuing with the data above, you can do:
result_tuple = tuple((value for _, value in df_result['column2'].items()))
Output: ('apple', 'banana')
Upvotes: 1
Reputation: 323226
Something like mode
df.drop_duplicates().column2.mode()
Out[156]:
0 apple
1 banana
dtype: object
Upvotes: 1
Reputation: 153460
Try this:
df.groupby('column2')['column1'].nunique()
Output:
column2
apple 4
banana 4
grape 1
mandarine 1
pineapple 1
Name: column1, dtype: int64
Then,
s = df.groupby('column2')['column1'].nunique()
s[s == s.max()].reset_index()
Output:
column2 column1
0 apple 4
1 banana 4
Upvotes: 1