In python, create a new columns using past of other variable

Question

I have a pandas data frame about a certain sport competition. I have 6 teams: A, B, C, D, E and F. The competition has 38 weeks. In each weak, we have a home owner team against a visiting team. I know the number of team matches doesn't match 38 weeks, but it's just to illustrate the problem. My df is like this:

df
ownerh_team  visit_team   some_var        weak    some_var2    score
          A           B          1           1            5        3
          C           D          2           1            3        3
          E           F          1           1            1        1

          A           D          4          38            2        0
          B           D          2          38            3        0
          F           E          1          38            1        1

The score variable is related to the owner home team (ownerh_team). If the owner home team wins, then the score is 3. If the visit team wins, the score is 0. And if the result of the match is tied, the score is 1.

I would like to construct a function new_var_func that creates a new column relating the past weaks. I other words, for each row of df (or equivalently for each home owner team) and for its respective week, I want to know how many times the home team scored 3 in the past 5 weeks.

I would like a function with this input structure:

def new_var_func(data = df, column = 'name_var' , n_past_weeks = 1 , score_value = 3):
    corpus

Note that the output of this functions is the same df but with one new column named 'name_var'. The parameter n_past_weeks = y is the numbers of past weeks to be investigated. The score_value = x means that I would like to know how many times the home team has a score = x in the past y weeks.

Perhaps a solution to creating new way variables as explained does not necessarily have to be using a function. I accept other solutions, but I will be grateful if you can use a function.

Massifox · Accepted Answer

Given the following dataframe:

df = pd.DataFrame([
    ['A','B',1,1,5,3],
    ['C','D',2,1,3,3],
    ['E','F',1,1,1,1],
    ['A','D',4,2,2,3],
    ['B','D',2,2,3,3],
    ['F','E',1,2,1,1],
    ['A','D',4,3,2,3],
    ['B','D',2,3,3,0],
    ['F','E',1,3,1,1]],
    columns=['ownerh_team','visit_team','some_var','week','some_var2','score'])

  ownerh_team visit_team  some_var  week  some_var2  score
0           A          B         1     1          5      3
1           C          D         2     1          3      3
2           E          F         1     1          1      1
3           A          D         4     2          2      3
4           B          D         2     2          3      3
5           F          E         1     2          1      1
6           A          D         4     3          2      3
7           B          D         2     3          3      0
8           F          E         1     3          1      1

I used pandas.Dataframe.query to filter the dataframe, and the pandas.Dataframe.apply function to iterate it efficiently. Try this code:

def new_var_func(data=df, column='name_var', n_past_weeks=1, score_value= 3):
    df[column] = df.apply(lambda row: len(df[(df.ownerh_team == row.ownerh_team) & 
                                                (df.score == score_value) & 
                                                (df.week >= row.week - n_past_weeks) &
                                                (df.week <= row.week)]), axis=1)
    return df



df = new_var_func(data=df, column='name_var', n_past_weeks=1, score_value=3)

And gives:

  ownerh_team visit_team  some_var  week  some_var2  score  name_var
0           A          B         1     1          5      3         1
1           C          D         2     1          3      3         1
2           E          F         1     1          1      1         0
3           A          D         4     2          2      3         2
4           B          D         2     2          3      3         1
5           F          E         1     2          1      1         0
6           A          D         4     3          2      3         2
7           B          D         2     3          3      0         1
8           F          E         1     3          1      1         0

Note: I considered that you want to also count the results of the current week, if you want to exclude the week you run from the count play replace df.week <= row.week with df.week < row.week, or modify the queries if I misinterpreted some part of your request

In python, create a new columns using past of other variable

Answers (1)

Related Questions