WGP
WGP

Reputation: 748

Conditional counting across a row in pandas when matching a string

I have pandas dataframe of the form,df=

index,result1,result2,result3 
  0     s       u       s     
  1     u       s       u   
  2     s                     
  3     s       s       u 

i would like to add another column that contains a list of the number of times s occurs in that row, for example

index,result1,result2,result3,count 
  0     s       u       s      2
  1     u       s       u      1
  2     s                      1
  3     s       s       u      2

i have tried the following code

col=['result1','result2','result3']
df[cols].count(axis=1)

but this returns

0,3
1,3
2,1
3,3

so this counts the number of elements, i then tried

df[df[cols]=='s'].count(axis=1)

but this returned the following error: "Could not compare ['s'] with block values"

Any help would be greatly appreciated

Upvotes: 1

Views: 1705

Answers (1)

jezrael
jezrael

Reputation: 863236

For me works cast to string by astype numeric and NaN columns return your error:

print (df)
   index result1 result2  result3  result4
0      0       s       u        7      NaN
1      1       u       s        7      NaN
2      2       s     NaN        8      NaN
3      3       s       s        7      NaN
4      4     NaN     NaN        2      NaN

print (df.dtypes)
index        int64
result1     object
result2     object
result3      int64
result4    float64
dtype: object

cols = ['result1','result2','result3','result4']
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

Or sum only True values from boolean mask:

print (df[cols].astype(str) == 's')

  result1 result2 result3 result4
0    True   False   False   False
1   False    True   False   False
2    True   False   False   False
3    True    True   False   False
4   False   False   False   False

cols = ['result1','result2','result3','result4']
df['count'] = (df[cols].astype(str) =='s').sum(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

Another nice solution is from Nickil Maveli - use numpy:

df['count'] = (df[cols].values=='s').sum(axis=1)

Upvotes: 2

Related Questions