Reputation: 470
I'm trying to optimise expected goal in football (soccer) matches by measuring the sum of squared difference against the individual match timeslots. Assuming each match is divided into k number of Timeslots with constant probabilities of a Goal scored by either team or No Goal.
**Sample SSD for individual match_i with Final score [0-0]**
xG is unique in each match.
Team1 and Team2 has the following xG multiplied by arbitrary multiplier M.
Team1 = xG_1*M
Team2 = xG_2*M
prob_1 = [1-(xG_1 + xG_2)/k, xG_1/k, xG_2/k].
where Prob_1
is a constant probability of a Draw
, a Team1 Goal
or a Team2 Goal
for each Timeslot (k)
per match_i
where sum(prob_1) = 1
.
To measure SSD
for match_i
.
x1 = [1,0,0] #; prob. of No goal scored per timeslot.
x2 = [0,1,0] #; prob. of Home Team scoring per timeslot.
x3 = [0,0,1] #; prob. of Away Team scoring per timeslot.
y = np.array([1-(xG_1 + xG_2)/k, xG_1/k, xG_2/k])
# Using xG_Team1 and xG_Team2 from table below.
total_timeslot = 180
Home_Goal = [] # No Goal scored
Away_Goal = [] # Np Goal scored
def sum_squared_diff(x1, x2, x3, y):
ssd=[]
for k in range(total_timeslot):
if k in Home_Goal:
ssd.append( sum((x2 - y)**2))
elif k in Away_Goal:
ssd.append(sum((x3 - y)**2))
else:
ssd.append(sum((x1 - y)**2))
return ssd
SSD_Result = sum_squared_diff(x1, x2, x3, y)
sum(SSD_Result)
For example, using xGs
of index 0
from the table below and M = 1
First, for k = 187 timeslot, xG per timeslot becomes 1.4405394105672238/187, 1.3800950382265837/187
and are constant throughout the match.
y_0 = np.array([1-(0.007703419308 + 0.007380187370)/187, 0.007703419308/187, 0.007380187370/187])
Using y_0 in the function above,
SSD_Result for xG at index 0 is 1.8252675137316426e-06.
As SSD
goes this looks promising but then again the match ended goalless and the two teams has almost identical xG figure
....
Now I want to apply the same procedure to xG index 1, xG index 2....xG index 10000.
Then take the total SSD
and depending on the value, change the arbitrary multiplier M
until best result is achieved.
**Question **
How can I convert the xG in each match to prob_1 like array and call it into the function above?
i.e. prob_1...prob_10000. Here's sample of xG.
individual_match_xG.tail()
xG_Team1 xG_Team2
0 1.440539 1.380095
1 2.123673 0.946116
2 1.819697 0.921660
3 1.132676 1.375717
4 1.244837 1.269933
So in conclusion,
* There are 10000 Final Score's with xG that I want to turn into 10000 prob_1. Then get an SSD for each.
* K is Total timeslote per match and is constant depending on the length of the intervals. For 30 sec timeslots, k is 180. Plus 7/2 mints of injuy time, k=187.
* Home_Goal, Away_Goal and No_Goal represents the prob. of a single goal scored per timeslot by the respective Team or No goal being scored.
* Only one Goal can be scored per timeslot.
Upvotes: 0
Views: 1111
Reputation: 23753
import numpy as np
# constants
M = 1.0
k = 180 # number of timeslots
x1 = [1,0,0] # prob. of No goal scored per timeslot.
x2 = [0,1,0] # prob. of Home Team scoring per timeslot.
x3 = [0,0,1] # prob. of Away Team scoring per timeslot.
# seven scores
final_scores = [[2,1],[3,3],[1,2],[1,1],[2,1],[4,0],[2,3]]
# time slots with goals
Home_Goal = [2, 3]
Away_Goal = [4]
# numpy arrays of the data
final_scores = np.array(final_scores) # team_1 is [:,0], team_2 is [:,1]
home_goal = np.array(Home_Goal)
away_goal = np.array(Away_Goal)
# fudge factor
adj_scores = final_scores * M # shape --> (# of scores, 2)
# calculate prob_1
slot_goal_probability = adj_scores / k # xG_n / k
slot_draw_probability = 1 - slot_goal_probability.sum(axis = 1) #1-(xG_1+xG_2)/k
# y for all scores
y = np.concatenate((slot_draw_probability[:,None], slot_goal_probability), axis=1)
# ssd for x2, x3, x1
home_ssd = np.sum(np.square(x2 - y), axis=1)
away_ssd = np.sum(np.square(x3 - y), axis=1)
draw_ssd = np.sum(np.square(x1 - y), axis=1)
ssd = np.zeros((y.shape[0],k))
ssd += draw_ssd[:,None] # all time slices a draw
ssd[:,home_goal] = home_ssd[:,None] # time slots with goal for home games
ssd[:,away_goal] = away_ssd[:,None] # time slots with goal for away games
Sum of probabilities (prob_1 in your example) for each score:
>>> y.sum(axis=1)
array([1., 1., 1., 1., 1., 1., 1.])
ssd
's shape is (# of scores,180) - it holds the time slot probability for all the scores.
>>> ssd.sum(axis=1)
array([5.92222222, 6. , 5.93333333, 5.93333333, 5.92222222,
5.95555556, 5.96666667])
>>> for thing in ssd.sum(axis=1):
print(thing)
5.922222222222222
6.000000000000001
5.933333333333332
5.933333333333337
5.922222222222222
5.955555555555557
5.966666666666663
>>>
Test y
with your function:
>>> y
array([[0.98333333, 0.01111111, 0.00555556],
[0.96666667, 0.01666667, 0.01666667],
[0.98333333, 0.00555556, 0.01111111],
[0.98888889, 0.00555556, 0.00555556],
[0.98333333, 0.01111111, 0.00555556],
[0.97777778, 0.02222222, 0. ],
[0.97222222, 0.01111111, 0.01666667]])
>>> for prob in y:
print(sum(sum_squared_diff(prob, x1, x2, x3)))
5.922222222222252
6.000000000000045
5.933333333333363
5.933333333333391
5.922222222222252
5.955555555555599
5.966666666666613
>>>
Some, hopefully, minor differences. I'll put them down to floating point or rounding errors in the 1e-14 range.
Maybe someone will see this and fine tune it a bit with more optimizations in their own answer. Once I worked it out I didn't look for further improvements.
Numpy Basics:
Indexing
Broadcasting
Upvotes: 1