determine common value in pandas column

Question

In the dataframe below:

0                                [Normal, Normal, Normal]
1                          [Good, Good, Good, Good, Good]
2                                  [Normal, Normal, Poor]
3       [Good, Good, Good, Good, Good, Normal, Normal,...
4                     [Good, Good, Good, Good, Very Good]
                              ...
1969             [Normal, Normal, Normal, Normal, Normal]
1970                                               [Poor]
1971             [Normal, Normal, Normal, Normal, Normal]
1972                                               [Poor]
1973                                       [Poor, Normal]

I want to determine the most repeated value in each list in each row. If there is no single value that is repeated most often, then any of the values will work. I tried using pandas mode but that does not work

jezrael · Accepted Answer

You can create DataFrame first and then use DataFrame.mode with select first column or use custom lambda function:

df['mode1'] = pd.DataFrame(df['col'].tolist(), index=df.index).mode(axis=1)[0]
df['mode2'] = df['col'].apply(lambda x: max(set(x), key=x.count))
print (df)
                                                    col   mode1   mode2
0                              [Normal, Normal, Normal]  Normal  Normal
1                        [Good, Good, Good, Good, Good]    Good    Good
2                                [Normal, Normal, Poor]  Normal  Normal
3     [Good, Good, Good, Good, Good, Normal, Normal,...    Good    Good
4                   [Good, Good, Good, Good, Very Good]    Good    Good
1969           [Normal, Normal, Normal, Normal, Normal]  Normal  Normal
1970                                             [Poor]    Poor    Poor
1971           [Normal, Normal, Normal, Normal, Normal]  Normal  Normal
1972                                             [Poor]    Poor    Poor
1973                                     [Poor, Normal]  Normal  Normal

Unfortunately pandas mode is slow, if performnace is important second solution is best choice here:

#[10000 rows x 1 columns]
df = pd.concat([df] * 1000, ignore_index=True)


In [46]: %timeit df['mode1']=pd.DataFrame(df['col'].tolist(),index=df.index).mode(axis=1)[0]
3.84 s ± 87.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [47]: %timeit df['mode2'] = df['col'].apply(lambda x: max(set(x), key=x.count))
11.4 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Mayank Porwal solution
In [48]: %timeit df['mode3'] = df['col'].apply(lambda x: Counter(x).most_common(1)[0][0])
55.7 ms ± 3.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

determine common value in pandas column

Answers (2)

Related Questions