E. Zeytinci
E. Zeytinci

Reputation: 2643

How can I find the cumulative sum from the different two columns?

I have a dataframe like as follows,

import pandas as pd

t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]

data = {
    'team1_id': t1_ids,
    'team2_id': t2_ids,
    'is_1st_team_won': values
}

df = pd.DataFrame(data)
print(df)


   team1_id  team2_id  is_1st_team_won
0      4991      6899                1
1      6899      6908                1
2      6665      4869                1
3      4991      6899                1
4      7010      6899                1
5      6899      4991                0

The issue what I can't do is find out the winning ratio for each team until the last match. So, as far as I know this issue can be solve with the cumsum() and the shift() functions but I can't find the exact solution. Here is the expected output,

   team1_id  team2_id  is_1st_team_won  t1_winning_ratio  t2_winning_ratio
0      4991      6899                1              0.00              0.00
1      6899      6908                1              0.00              0.00
2      6665      4869                1              0.00              0.00
3      4991      6899                1              1.00              0.50
4      7010      6899                1              0.00              0.33
5      6899      4991                0              0.25              1.00

If you follow the team whose ID is 6899, the team lose their first match. (1st line). They won their second match (2nd line). Therefore, when they were going for the third match, the winning rate was 0.5 (4th line). They also lose the third match, so when they move on to the next match, the winning ratio was 0.33 (5th line). And finally, when they lose the fourth match, the winning ratio was 1/4 = 0.25.

How can I do that? Thanks in advance.

Upvotes: 1

Views: 102

Answers (3)

Lumber Jack
Lumber Jack

Reputation: 622

As I don't know exactly what you except to achieve, here is a straight way to get ratio for a team:

import pandas as pd


t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]

data = {
    'team1_id': t1_ids,
    'team2_id': t2_ids,
    'is_1st_team_won': values
}

df = pd.DataFrame(data)
df['is_2nd_team_won'] = ~df.is_1st_team_won+2


def get_ratio(_id):
    _w=0
    _m =0
    for i in range(len(df)):
        if df.team1_id[i] == _id :
            _m+=1
            if df.is_1st_team_won[i] == 1:
                _w+=1
        elif df.team2_id[i] == _id :
            _m += 1
            if df.is_2nd_team_won[i] == 1:
                _w+=1
        if _m > 0:
            df.loc[i,'team_'+str(_id)+'_winning_ratio'] = _w/_m
        else:
            df.loc[i,'team_'+str(_id)+'_winning_ratio'] = 0
    return(df)
ID = (df.team1_id.append(df.team2_id)).unique()
for _id in ID:
    df = get_ratio(_id)
df

OUTPUT

enter image description here

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150745

Let's try:

# get the winners
df['winner'] = np.where(df['is_1st_team_won']==1, df['team1_id'], df['team2_id'])

# get the winning game with `get_dummies` so far
winning_matches = (pd.get_dummies(df['winner'])
                   .shift(fill_value=0)
                   .cumsum()
                   )

# get the game play so far
game_plays = (pd.get_dummies(df['team1_id'])
                .add(pd.get_dummies(df['team2_id']), fill_value=0)
                .cumsum().shift()
             )

# the winning ratio
winning_ratio = winning_matches.div(game_plays).fillna(0)

# now lookup:
# note that lookup is deprecated since pandas 1.2.0 for the reason I can't understand
df['t1_winning_ratio'] = winning_ratio.lookup(df.index, df['team1_id'])
df['t2_winning_ratio'] = winning_ratio.lookup(df.index, df['team2_id'])

Output:

      team1_id    team2_id    is_1st_team_won    winner    t1_winning_ratio    t2_winning_ratio
--  ----------  ----------  -----------------  --------  ------------------  ------------------
 0        4991        6899                  1      4991                0               0
 1        6899        6908                  1      6899                0               0
 2        6665        4869                  1      6665                0               0
 3        4991        6899                  1      4991                1               0.5
 4        7010        6899                  1      7010                0               0.333333
 5        6899        4991                  0      4991                0.25            1

Upvotes: 1

BoomBoxBoy
BoomBoxBoy

Reputation: 1885

Here is one way of doing that.

my_dict = {}
t1_winning_ratio_list = []
t2_winning_ratio_list = []

for pair in df[['team1_id','team2_id','is_1st_team_won']].values:
    try:
        t1_winning_ratio_list.append(my_dict[pair[0]]['won']/my_dict[pair[0]]['game_count'])
    except:
        t1_winning_ratio_list.append(0)
        
    try:
        t2_winning_ratio_list.append(my_dict[pair[1]]['won']/my_dict[pair[1]]['game_count'])
    except:
        t2_winning_ratio_list.append(0)
    
    if pair[0] in my_dict:
        my_dict[pair[0]]['game_count'] += 1
        if pair[2] == 1:
            my_dict[pair[0]]['won'] += 1
    elif pair[0] not in my_dict:
        my_dict[pair[0]] = {}
        my_dict[pair[0]]['game_count'] = 1
        if pair[2] == 1:
            my_dict[pair[0]]['won'] = 1
        else:
            my_dict[pair[0]]['won'] = 0
            
    if pair[1] in my_dict:
        my_dict[pair[1]]['game_count'] += 1
        if pair[2] == 0:
            my_dict[pair[1]]['won'] += 1
    elif pair[1] not in my_dict:
        my_dict[pair[1]] = {}
        my_dict[pair[1]]['game_count'] = 1
        if pair[2] == 0:
            my_dict[pair[1]]['won'] = 1
        else:
            my_dict[pair[1]]['won'] = 0
    
df['t1_winning_ratio'] = t1_winning_ratio_list
df['t2_winning_ratio'] = t2_winning_ratio_list
        

Upvotes: 1

Related Questions