Reputation: 81

How can I find the most frequent two-column combination in a dataframe in python

I have my data in pandas data frame as follows:

df = pd.DataFrame({'a':[1,2,3,3,4,4,4], 'b':[2,3,4,4,5,5,5]})

So the dataframe looks like this:

The column 'a','b' combination here are: 12(1), 23(1), 34(2), 45(3). I am trying to select 4 and 5 and print them out because their combination has most occurrences (3 times).

My code is:

counts = df.groupby(['a','b']).size().sort_values(ascending=False)
print(counts)

Output:

a  b
4  5    3
3  4    2
2  3    1
1  2    1
dtype: int64

But this only gives me a column [3,2,1,1]. This are the numbers combination counts. How can I access elements 4 and 5 individually so I can print them out?

Thanks in advance!

Upvotes: 7

Answers (3)

Karn Kumar

Reputation: 8816

Simplest one to use mode in pandas DataFrame. It'll give a most frequent values across the rows or columns:

>>> df
   a  b
0  1  2
1  2  3
2  3  4
3  3  4
4  4  5
5  4  5
6  4  5

>>> df.mode()
   a  b
0  4  5

Upvotes: 1

BENY

Reputation: 323226

Using idxmax, even the result is inorder, you still can find the index of max value

df.groupby(['a','b']).size().idxmax()
Out[15]: (4, 5)

Upvotes: 3

jpp

Reputation: 164623

Pandas GroupBy objects are indexed by grouper key(s). In the case of multiple keys, this will mean a MultiIndex. You can just extract the first index of your result to give a tuple representing the most common combination:

counts.index[0]  # (4, 5)

Upvotes: 2

How can I find the most frequent two-column combination in a dataframe in python

Answers (3)

Related Questions