Calculating mode in Pandas when using groupby

Question

I have a table as follows:

Col1 | Col2 | Col3
AAA  | 1    | a
AAA  | 1    | a
AAA  | 1    | b
AAA  | 2    | b
AAA  | 2    | b
AAA  | 2    | b
AAA  | 3    | a
BBB  | 1    | b
BBB  | 1    | b

I want to reduce the table in the following two steps:

Find the most frequently occurring value in Col3 corresponding to the (Col1, Col2) value pair.
From the result of step1, keep only the most frequently occurring value corresponding to Col1 value.

Applying step1 to the table above: The mode (or most frequently occurring value) corresponding to (AAA, 1) is a, and so on. We get:

Col1 | Col2 | newCol1
AAA  | 1    | a
AAA  | 2    | b
AAA  | 3    | a
BBB  | 1    | b

Applying step2 to this table, we see that a is the mode corresponding to AAA and b is the most frequently occurring value corresponding to BBB - so we get:

Col1 | newCol2
AAA  | a  
BBB  | b

BENY · Accepted Answer

Let us do it one line

df.groupby(['Col1','Col2']).Col3.apply(pd.Series.mode).\ 
      groupby(level=0).apply(pd.Series.mode)
Out[136]: 
Col1   
AAA   0    a
BBB   0    b
Name: Col3, dtype: object

Just for fun

pd.crosstab([df.Col1,df.Col2],df.Col3).idxmax(1).groupby(level=0).apply(pd.Series.mode)
Out[140]: 
Col1   
AAA   0    a
BBB   0    b
dtype: object

Calculating mode in Pandas when using groupby

Answers (2)

Related Questions