Reputation: 10223
I've a Pandas DataFrame with 3 columns:
c={'a': [['US']],'b': [['US']], 'c': [['US','BE']]}
df = pd.DataFrame(c, columns = ['a','b','c'])
Now I need the max value of these 3 columns.
I've tried:
df['max_val'] = df[['a','b','c']].max(axis=1)
The result is Nan
instead of the expected output: US
.
How can I get the max value for these 3 columns? (and what if one of them contains Nan
)
Upvotes: 2
Views: 739
Reputation: 1
As I can see you have some elements as a list type, So I think the below-mentioned code will work fine.
from scipy.stats import mode
arr = []
for i in df:
for j in range(len(df[i])):
for k in range(len(df[i][j])):
arr.append(df[i][j][k])
from collections import Counter
b = Counter(arr)
print(b.most_common())
this will give you an answer as you want.
Upvotes: 0
Reputation: 463
while your data are lists, you can't use pandas.mode()
. because lists objects are unhashable and mode()
function won't work.
a solution is converting the elements of your dataframe's row to strings and then use pandas.mode()
.
check this:
>>> import pandas as pd
>>> c = {'a': [['US','BE']],'b': [['US']], 'c': [['US','BE']]}
>>> df = pd.DataFrame(c, columns = ['a','b','c'])
>>> x = df.iloc[0].apply(lambda x: str(x))
>>> x.mode()
# Answer:
0 ['US', 'BE']
dtype: object
>>> d = {'a': [['US']],'b': [['US']], 'c': [['US','BE']]}
>>> df2 = pd.DataFrame(d, columns = ['a','b','c'])
>>> z = df.iloc[0].apply(lambda z: str(z))
>>> z.mode()
# Answer:
0 ['US']
dtype: object
Upvotes: 0
Reputation: 26686
if it as @ Erfan stated, most common value in a row then .agg()
, mode
df.agg('mode', axis=1)
0
0 [US, BE]
1 [US]
Upvotes: 0
Reputation: 863791
Use:
c={'a': [['US', 'BE'],['US']],'b': [['US'],['US']], 'c': [['US','BE'],['US','BE']]}
df = pd.DataFrame(c, columns = ['a','b','c'])
from collections import Counter
df = df[['a','b','c']].apply(lambda x: list(Counter(map(tuple, x)).most_common()[0][0]), 1)
print (df)
0 [US, BE]
1 [US]
dtype: object
Upvotes: 1