user308827
user308827

Reputation: 21961

determine common value in pandas column

In the dataframe below:

0                                [Normal, Normal, Normal]
1                          [Good, Good, Good, Good, Good]
2                                  [Normal, Normal, Poor]
3       [Good, Good, Good, Good, Good, Normal, Normal,...
4                     [Good, Good, Good, Good, Very Good]
                              ...
1969             [Normal, Normal, Normal, Normal, Normal]
1970                                               [Poor]
1971             [Normal, Normal, Normal, Normal, Normal]
1972                                               [Poor]
1973                                       [Poor, Normal]

I want to determine the most repeated value in each list in each row. If there is no single value that is repeated most often, then any of the values will work. I tried using pandas mode but that does not work

Upvotes: 1

Views: 56

Answers (2)

jezrael
jezrael

Reputation: 862521

You can create DataFrame first and then use DataFrame.mode with select first column or use custom lambda function:

df['mode1'] = pd.DataFrame(df['col'].tolist(), index=df.index).mode(axis=1)[0]
df['mode2'] = df['col'].apply(lambda x: max(set(x), key=x.count))
print (df)
                                                    col   mode1   mode2
0                              [Normal, Normal, Normal]  Normal  Normal
1                        [Good, Good, Good, Good, Good]    Good    Good
2                                [Normal, Normal, Poor]  Normal  Normal
3     [Good, Good, Good, Good, Good, Normal, Normal,...    Good    Good
4                   [Good, Good, Good, Good, Very Good]    Good    Good
1969           [Normal, Normal, Normal, Normal, Normal]  Normal  Normal
1970                                             [Poor]    Poor    Poor
1971           [Normal, Normal, Normal, Normal, Normal]  Normal  Normal
1972                                             [Poor]    Poor    Poor
1973                                     [Poor, Normal]  Normal  Normal

Unfortunately pandas mode is slow, if performnace is important second solution is best choice here:

#[10000 rows x 1 columns]
df = pd.concat([df] * 1000, ignore_index=True)


In [46]: %timeit df['mode1']=pd.DataFrame(df['col'].tolist(),index=df.index).mode(axis=1)[0]
3.84 s ± 87.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [47]: %timeit df['mode2'] = df['col'].apply(lambda x: max(set(x), key=x.count))
11.4 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Mayank Porwal solution
In [48]: %timeit df['mode3'] = df['col'].apply(lambda x: Counter(x).most_common(1)[0][0])
55.7 ms ± 3.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34046

You can also use Python's collections.Counter.most_common method with df.apply here:

In [2223]: from collections import Counter

In [2212]: df = pd.DataFrame({'col': [['Normal', 'Normal', 'Normal'], ['Good', 'Good', 'Good', 'Good', 'Good'], ['Normal', 'Normal', 'Poor']]})

In [2213]: df
Out[2213]: 
                              col
0        [Normal, Normal, Normal]
1  [Good, Good, Good, Good, Good]
2          [Normal, Normal, Poor]

In [2221]: df['most_common'] = df['col'].apply(lambda x: Counter(x).most_common(1)[0][0])

In [2222]: df
Out[2222]: 
                              col most_common
0        [Normal, Normal, Normal]      Normal
1  [Good, Good, Good, Good, Good]        Good
2          [Normal, Normal, Poor]      Normal

Upvotes: 1

Related Questions