Reputation: 2643
I have a dataframe like as follows,
import pandas as pd
t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]
data = {
'team1_id': t1_ids,
'team2_id': t2_ids,
'is_1st_team_won': values
}
df = pd.DataFrame(data)
print(df)
team1_id team2_id is_1st_team_won
0 4991 6899 1
1 6899 6908 1
2 6665 4869 1
3 4991 6899 1
4 7010 6899 1
5 6899 4991 0
The issue what I can't do is find out the winning ratio for each team until the last match. So, as far as I know this issue can be solve with the cumsum()
and the shift()
functions but I can't find the exact solution. Here is the expected output,
team1_id team2_id is_1st_team_won t1_winning_ratio t2_winning_ratio
0 4991 6899 1 0.00 0.00
1 6899 6908 1 0.00 0.00
2 6665 4869 1 0.00 0.00
3 4991 6899 1 1.00 0.50
4 7010 6899 1 0.00 0.33
5 6899 4991 0 0.25 1.00
If you follow the team whose ID is 6899, the team lose their first match. (1st line). They won their second match (2nd line). Therefore, when they were going for the third match, the winning rate was 0.5 (4th line). They also lose the third match, so when they move on to the next match, the winning ratio was 0.33 (5th line). And finally, when they lose the fourth match, the winning ratio was 1/4 = 0.25.
How can I do that? Thanks in advance.
Upvotes: 1
Views: 102
Reputation: 622
As I don't know exactly what you except to achieve, here is a straight way to get ratio for a team:
import pandas as pd
t1_ids = [4991, 6899, 6665, 4991, 7010, 6899]
t2_ids = [6899, 6908, 4869, 6899, 6899, 4991]
values = [1, 1, 1, 1, 1, 0]
data = {
'team1_id': t1_ids,
'team2_id': t2_ids,
'is_1st_team_won': values
}
df = pd.DataFrame(data)
df['is_2nd_team_won'] = ~df.is_1st_team_won+2
def get_ratio(_id):
_w=0
_m =0
for i in range(len(df)):
if df.team1_id[i] == _id :
_m+=1
if df.is_1st_team_won[i] == 1:
_w+=1
elif df.team2_id[i] == _id :
_m += 1
if df.is_2nd_team_won[i] == 1:
_w+=1
if _m > 0:
df.loc[i,'team_'+str(_id)+'_winning_ratio'] = _w/_m
else:
df.loc[i,'team_'+str(_id)+'_winning_ratio'] = 0
return(df)
ID = (df.team1_id.append(df.team2_id)).unique()
for _id in ID:
df = get_ratio(_id)
df
OUTPUT
Upvotes: 0
Reputation: 150745
Let's try:
# get the winners
df['winner'] = np.where(df['is_1st_team_won']==1, df['team1_id'], df['team2_id'])
# get the winning game with `get_dummies` so far
winning_matches = (pd.get_dummies(df['winner'])
.shift(fill_value=0)
.cumsum()
)
# get the game play so far
game_plays = (pd.get_dummies(df['team1_id'])
.add(pd.get_dummies(df['team2_id']), fill_value=0)
.cumsum().shift()
)
# the winning ratio
winning_ratio = winning_matches.div(game_plays).fillna(0)
# now lookup:
# note that lookup is deprecated since pandas 1.2.0 for the reason I can't understand
df['t1_winning_ratio'] = winning_ratio.lookup(df.index, df['team1_id'])
df['t2_winning_ratio'] = winning_ratio.lookup(df.index, df['team2_id'])
Output:
team1_id team2_id is_1st_team_won winner t1_winning_ratio t2_winning_ratio
-- ---------- ---------- ----------------- -------- ------------------ ------------------
0 4991 6899 1 4991 0 0
1 6899 6908 1 6899 0 0
2 6665 4869 1 6665 0 0
3 4991 6899 1 4991 1 0.5
4 7010 6899 1 7010 0 0.333333
5 6899 4991 0 4991 0.25 1
Upvotes: 1
Reputation: 1885
Here is one way of doing that.
my_dict = {}
t1_winning_ratio_list = []
t2_winning_ratio_list = []
for pair in df[['team1_id','team2_id','is_1st_team_won']].values:
try:
t1_winning_ratio_list.append(my_dict[pair[0]]['won']/my_dict[pair[0]]['game_count'])
except:
t1_winning_ratio_list.append(0)
try:
t2_winning_ratio_list.append(my_dict[pair[1]]['won']/my_dict[pair[1]]['game_count'])
except:
t2_winning_ratio_list.append(0)
if pair[0] in my_dict:
my_dict[pair[0]]['game_count'] += 1
if pair[2] == 1:
my_dict[pair[0]]['won'] += 1
elif pair[0] not in my_dict:
my_dict[pair[0]] = {}
my_dict[pair[0]]['game_count'] = 1
if pair[2] == 1:
my_dict[pair[0]]['won'] = 1
else:
my_dict[pair[0]]['won'] = 0
if pair[1] in my_dict:
my_dict[pair[1]]['game_count'] += 1
if pair[2] == 0:
my_dict[pair[1]]['won'] += 1
elif pair[1] not in my_dict:
my_dict[pair[1]] = {}
my_dict[pair[1]]['game_count'] = 1
if pair[2] == 0:
my_dict[pair[1]]['won'] = 1
else:
my_dict[pair[1]]['won'] = 0
df['t1_winning_ratio'] = t1_winning_ratio_list
df['t2_winning_ratio'] = t2_winning_ratio_list
Upvotes: 1