D_M
D_M

Reputation: 87

Calculating mode in Pandas when using groupby

I have a table as follows:

Col1 | Col2 | Col3
AAA  | 1    | a
AAA  | 1    | a
AAA  | 1    | b
AAA  | 2    | b
AAA  | 2    | b
AAA  | 2    | b
AAA  | 3    | a
BBB  | 1    | b
BBB  | 1    | b

I want to reduce the table in the following two steps:

  1. Find the most frequently occurring value in Col3 corresponding to the (Col1, Col2) value pair.

  2. From the result of step1, keep only the most frequently occurring value corresponding to Col1 value.

Applying step1 to the table above: The mode (or most frequently occurring value) corresponding to (AAA, 1) is a, and so on. We get:

Col1 | Col2 | newCol1
AAA  | 1    | a
AAA  | 2    | b
AAA  | 3    | a
BBB  | 1    | b

Applying step2 to this table, we see that a is the mode corresponding to AAA and b is the most frequently occurring value corresponding to BBB - so we get:

Col1 | newCol2
AAA  | a  
BBB  | b

Upvotes: 3

Views: 433

Answers (2)

BENY
BENY

Reputation: 323226

Let us do it one line

df.groupby(['Col1','Col2']).Col3.apply(pd.Series.mode).\ 
      groupby(level=0).apply(pd.Series.mode)
Out[136]: 
Col1   
AAA   0    a
BBB   0    b
Name: Col3, dtype: object

Just for fun

pd.crosstab([df.Col1,df.Col2],df.Col3).idxmax(1).groupby(level=0).apply(pd.Series.mode)
Out[140]: 
Col1   
AAA   0    a
BBB   0    b
dtype: object

Upvotes: 3

anky
anky

Reputation: 75080

So you mean:

df_new=df.groupby(['Col1','Col2'])['Col3'].apply(lambda x:x.mode()).reset_index([0,1]).\
groupby('Col1')['Col3'].apply(lambda x: x.mode()).reset_index(0).reset_index(drop=True)
print(df_new)

  Col1   Col3
0  AAA      a
1  BBB      b

Upvotes: 2

Related Questions