Reputation: 21961
In the dataframe below:
0 [Normal, Normal, Normal]
1 [Good, Good, Good, Good, Good]
2 [Normal, Normal, Poor]
3 [Good, Good, Good, Good, Good, Normal, Normal,...
4 [Good, Good, Good, Good, Very Good]
...
1969 [Normal, Normal, Normal, Normal, Normal]
1970 [Poor]
1971 [Normal, Normal, Normal, Normal, Normal]
1972 [Poor]
1973 [Poor, Normal]
I want to determine the most repeated value in each list in each row. If there is no single value that is repeated most often, then any of the values will work. I tried using pandas mode but that does not work
Upvotes: 1
Views: 56
Reputation: 862521
You can create DataFrame first and then use DataFrame.mode
with select first column or use custom lambda function:
df['mode1'] = pd.DataFrame(df['col'].tolist(), index=df.index).mode(axis=1)[0]
df['mode2'] = df['col'].apply(lambda x: max(set(x), key=x.count))
print (df)
col mode1 mode2
0 [Normal, Normal, Normal] Normal Normal
1 [Good, Good, Good, Good, Good] Good Good
2 [Normal, Normal, Poor] Normal Normal
3 [Good, Good, Good, Good, Good, Normal, Normal,... Good Good
4 [Good, Good, Good, Good, Very Good] Good Good
1969 [Normal, Normal, Normal, Normal, Normal] Normal Normal
1970 [Poor] Poor Poor
1971 [Normal, Normal, Normal, Normal, Normal] Normal Normal
1972 [Poor] Poor Poor
1973 [Poor, Normal] Normal Normal
Unfortunately pandas mode
is slow, if performnace is important second solution is best choice here:
#[10000 rows x 1 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [46]: %timeit df['mode1']=pd.DataFrame(df['col'].tolist(),index=df.index).mode(axis=1)[0]
3.84 s ± 87.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [47]: %timeit df['mode2'] = df['col'].apply(lambda x: max(set(x), key=x.count))
11.4 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#Mayank Porwal solution
In [48]: %timeit df['mode3'] = df['col'].apply(lambda x: Counter(x).most_common(1)[0][0])
55.7 ms ± 3.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1
Reputation: 34046
You can also use Python's collections.Counter.most_common
method with df.apply
here:
In [2223]: from collections import Counter
In [2212]: df = pd.DataFrame({'col': [['Normal', 'Normal', 'Normal'], ['Good', 'Good', 'Good', 'Good', 'Good'], ['Normal', 'Normal', 'Poor']]})
In [2213]: df
Out[2213]:
col
0 [Normal, Normal, Normal]
1 [Good, Good, Good, Good, Good]
2 [Normal, Normal, Poor]
In [2221]: df['most_common'] = df['col'].apply(lambda x: Counter(x).most_common(1)[0][0])
In [2222]: df
Out[2222]:
col most_common
0 [Normal, Normal, Normal] Normal
1 [Good, Good, Good, Good, Good] Good
2 [Normal, Normal, Poor] Normal
Upvotes: 1