Reputation: 533
I have a pandas data frame about a certain sport competition. I have 6 teams: A, B, C, D, E and F. The competition has 38 weeks. In each weak, we have a home owner team against a visiting team. I know the number of team matches doesn't match 38 weeks, but it's just to illustrate the problem. My df is like this:
df
ownerh_team visit_team some_var weak some_var2 score
A B 1 1 5 3
C D 2 1 3 3
E F 1 1 1 1
A D 4 38 2 0
B D 2 38 3 0
F E 1 38 1 1
The score variable is related to the owner home team (ownerh_team). If the owner home team wins, then the score is 3. If the visit team wins, the score is 0. And if the result of the match is tied, the score is 1.
I would like to construct a function new_var_func that creates a new column relating the past weaks. I other words, for each row of df (or equivalently for each home owner team) and for its respective week, I want to know how many times the home team scored 3 in the past 5 weeks.
I would like a function with this input structure:
def new_var_func(data = df, column = 'name_var' , n_past_weeks = 1 , score_value = 3):
corpus
Note that the output of this functions is the same df but with one new column named 'name_var'. The parameter n_past_weeks = y is the numbers of past weeks to be investigated. The score_value = x means that I would like to know how many times the home team has a score = x in the past y weeks.
Perhaps a solution to creating new way variables as explained does not necessarily have to be using a function. I accept other solutions, but I will be grateful if you can use a function.
Upvotes: 1
Views: 112
Reputation: 4497
Given the following dataframe:
df = pd.DataFrame([
['A','B',1,1,5,3],
['C','D',2,1,3,3],
['E','F',1,1,1,1],
['A','D',4,2,2,3],
['B','D',2,2,3,3],
['F','E',1,2,1,1],
['A','D',4,3,2,3],
['B','D',2,3,3,0],
['F','E',1,3,1,1]],
columns=['ownerh_team','visit_team','some_var','week','some_var2','score'])
ownerh_team visit_team some_var week some_var2 score
0 A B 1 1 5 3
1 C D 2 1 3 3
2 E F 1 1 1 1
3 A D 4 2 2 3
4 B D 2 2 3 3
5 F E 1 2 1 1
6 A D 4 3 2 3
7 B D 2 3 3 0
8 F E 1 3 1 1
I used pandas.Dataframe.query to filter the dataframe, and the pandas.Dataframe.apply function to iterate it efficiently. Try this code:
def new_var_func(data=df, column='name_var', n_past_weeks=1, score_value= 3):
df[column] = df.apply(lambda row: len(df[(df.ownerh_team == row.ownerh_team) &
(df.score == score_value) &
(df.week >= row.week - n_past_weeks) &
(df.week <= row.week)]), axis=1)
return df
df = new_var_func(data=df, column='name_var', n_past_weeks=1, score_value=3)
And gives:
ownerh_team visit_team some_var week some_var2 score name_var
0 A B 1 1 5 3 1
1 C D 2 1 3 3 1
2 E F 1 1 1 1 0
3 A D 4 2 2 3 2
4 B D 2 2 3 3 1
5 F E 1 2 1 1 0
6 A D 4 3 2 3 2
7 B D 2 3 3 0 1
8 F E 1 3 1 1 0
Note: I considered that you want to also count the results of the current week, if you want to exclude the week you run from the count play replace df.week <= row.week
with df.week < row.week
, or modify the queries if I misinterpreted some part of your request
Upvotes: 2