Raving SonofCodes
Raving SonofCodes

Reputation: 69

Python: Pandas dataframe column evaluation

This python code snip :

print(dfs[0].iloc[[1,3,5,7,9,11,13,15,17,19],[1]])

Yields the below Pandas column

score
1   0 - 3
3   7 - 0
5   1 - 1
7   3 - 2
9   3 - 3
11  3 - 0
13  2 - 2
15  0 - 1
17  4 - 1
19  1 - 2

What i need is to further evaluate the score line to this format:

     score      outcome
    1   0 - 3   2 (team2 won)
    3   7 - 0   1 (team1 won)
    5   1 - 1   x (ended in a draw)
    7   3 - 2   1
    9   3 - 3   x
    11  3 - 0   1
    13  2 - 2   x
    15  0 - 1   2
    17  4 - 1   1
    19  1 - 2   2

And finally just output this as a single 21x1x1x212.

My knowledge is rudimentary at best, but I have to do this. Do I need to split the score column and create 2 different columns whose values can then be compared and the resulted printed? There must be a better way.

Upvotes: 2

Views: 403

Answers (3)

sophocles
sophocles

Reputation: 13831

You can create series with the scores of each team using str.split() and grabbing each team's score using [0] or [1] and assign back to new column using np.where:

t1 , t2 = df['score'].str.split(' - ',expand=True)[0] , df['score'].str.split(' - ',expand=True)[1]

df['outcome'] = np.where(t1>t2,1,
                         np.where(t2>t1,2,'x'))

Will get you back:

Out[31]: 
   score outcome
0  0 - 3       2
1  7 - 0       1
2  1 - 1       x
3  3 - 2       1
4  3 - 3       x
5  3 - 0       1
6  2 - 2       x
7  0 - 1       2
8  4 - 1       1
9  1 - 2       2

Upvotes: 3

simon chou
simon chou

Reputation: 11

You can use pd.Series.apply function.

e.g.

import re 
# write a funciton apply to each elements
def judge(x):
    m = re.search(r'(\d+ - \d+)', x)
    score1, score2 = int(m.group(1)), int(m.group(2))
    if score1 > score2:
        o = 1
    elif score1 < score2:
        o = 2
    else:
        o = 'x'
    return o

# apply the function to element in df['score'] 
df['team_win'] = df['score'].apply(judge)

df
   col  score team_win
0    1  0 - 3        2
1    3  7 - 0        1
2    5  1 - 1        x
3    7  3 - 2        1
4    9  3 - 3        x
5   11  3 - 0        1
6   13  2 - 2        x
7   15  0 - 1        2
8   17  4 - 1        1
9   19  1 - 2        2

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

on the assumption you have only two teams to compare.

use str.split with idxmax() and a boolean to check for draws.

import numpy as np 
s = df['score'].str.split('-',expand=True).astype(int)
df['team_win'] = np.where(s[0] == s[1], 0,s.idxmax(1) + 1)

   score  team_win
0  0 - 3         2
1  7 - 0         1
2  1 - 1         0
3  3 - 2         1
4  3 - 3         0
5  3 - 0         1
6  2 - 2         0
7  0 - 1         2
8  4 - 1         1
9  1 - 2         2

Upvotes: 0

Related Questions