Reputation: 87
I have a table as follows:
Col1 | Col2 | Col3
AAA | 1 | a
AAA | 1 | a
AAA | 1 | b
AAA | 2 | b
AAA | 2 | b
AAA | 2 | b
AAA | 3 | a
BBB | 1 | b
BBB | 1 | b
I want to reduce the table in the following two steps:
Find the most frequently occurring value in Col3 corresponding to the (Col1, Col2) value pair.
From the result of step1, keep only the most frequently occurring value corresponding to Col1 value.
Applying step1 to the table above:
The mode (or most frequently occurring value) corresponding to (AAA, 1)
is a
, and so on. We get:
Col1 | Col2 | newCol1
AAA | 1 | a
AAA | 2 | b
AAA | 3 | a
BBB | 1 | b
Applying step2 to this table, we see that a
is the mode corresponding to AAA
and b
is the most frequently occurring value corresponding to BBB
- so we get:
Col1 | newCol2
AAA | a
BBB | b
Upvotes: 3
Views: 433
Reputation: 323226
Let us do it one line
df.groupby(['Col1','Col2']).Col3.apply(pd.Series.mode).\
groupby(level=0).apply(pd.Series.mode)
Out[136]:
Col1
AAA 0 a
BBB 0 b
Name: Col3, dtype: object
Just for fun
pd.crosstab([df.Col1,df.Col2],df.Col3).idxmax(1).groupby(level=0).apply(pd.Series.mode)
Out[140]:
Col1
AAA 0 a
BBB 0 b
dtype: object
Upvotes: 3
Reputation: 75080
So you mean:
df_new=df.groupby(['Col1','Col2'])['Col3'].apply(lambda x:x.mode()).reset_index([0,1]).\
groupby('Col1')['Col3'].apply(lambda x: x.mode()).reset_index(0).reset_index(drop=True)
print(df_new)
Col1 Col3
0 AAA a
1 BBB b
Upvotes: 2