CarterB
CarterB

Reputation: 542

Apply function to each row in a dataframe

I am trying to apply the following function for each row in a dataframe. The dataframe looks as follows:

vote_1 vote_2 vote_3 vote_4
a      a       a      b           
b      b       a      b          
b      a       a      b           

I am tring to generate a fourth column to sum the 'votes' of the other columns and produce the winner, as follows:

vote_1 vote_2 vote_3 vote_4 winner_columns
a      a       a      b           a
b      b       a      b           b 
b      a       a      b           draw

I have currently tried:

def winner(x):
    a = new_df.iloc[x].value_counts()['a']
    b = new_df.iloc[x].value_counts()['b']
    if a > b:
        y = 'a'
    elif a < b:
        y = 'b'
    else:
        y = 'draw'
    return y

df['winner_columns'].apply(winner)

However the whole column gets filled with draws. I assume is something with the way I have build the function but can't figure out what

Upvotes: 1

Views: 730

Answers (3)

Carmoreno
Carmoreno

Reputation: 1319

You can use .sum() for counting the votes, then you save in a list the winners, finally you add into dataframe.

numpy_votes = dataframe_votes.to_numpy()    
winner_columns = []
for i in numpy_votes:
  if np.sum(i == 'a') < np.sum(i == 'b'):
     winner_columns.append('b')
  elif np.sum(i == 'a') > np.sum(i == 'b'):
     winner_columns.append('a')
  else:
     winner_columns.append('draw')
    
dataframe_votes['winner_columns'] = winner_columns

Using .sum() method is the fastest way to count elements inside arrays according to this answer.

Output:

    vote_1  vote_2  vote_3  vote_4  winner_columns
0   a        a         a        b       a
1   b        b         a        b       b
2   b        a         a        b       draw

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30971

The first problem is that your DataFrame contains sometimes a letter followed by a dot.

So to look for solely 'a' or 'b' you have to replace these dots with an empty string, something like:

df.replace('\.', '', regex=True)

Another problem, which didin't surface in your case, is that a row can contain only 'a' or 'b' and your code should be resistant to absence of particular result in such a source row.

To make your function resistant to such cases, change it to:

def winner(row):
    vc = row.value_counts()
    a = vc.get('a', 0)
    b = vc.get('b', 0)
    if a > b: return 'a'
    elif a < b: return 'b'
    else: return 'draw'

Then you can apply your function, but if you want to apply it to each row (not column), you should pass axis=1.

So, to sum up, change your code to:

df['winner_columns'] = df.replace('\.', '', regex=True).apply(winner, axis=1)

The result, for your sample data, is:

  vote_1 vote_2 vote_3 vote_4 winner_columns
0     a.     a.     a.      b              a
1     b.     b.      a      b              b
2     b.     a.      a      b           draw

Upvotes: 1

jezrael
jezrael

Reputation: 862441

You can use DataFrame.mode and count non missing values by DataFrame.count, if only one use first column else draw in numpy.where:

df1 = df.mode(axis=1)
print (df1)
   0    1
0  a  NaN
1  b  NaN
2  a    b

df['winner_columns'] = np.where(df1.count(axis=1).eq(1), df1[0], 'draw')
print (df)
  vote_1 vote_2 vote_3 vote_4 winner_columns
0      a      a      a      b              a
1      b      b      a      b              b
2      b      a      a      b           draw

Your solution is possible change:

def winner(x):
    s = x.value_counts()
    a = s['a']
    b = s['b']
    if a > b:
        y = 'a'
    elif a < b:
        y = 'b'
    else:
        y = 'draw'
    return y

df['winner_columns'] = df.apply(winner,axis=1)
print (df)
  vote_1 vote_2 vote_3 vote_4 winner_columns
0      a      a      a      b              a
1      b      b      a      b              b
2      b      a      a      b           draw

Upvotes: 2

Related Questions