Tiago Tristão
Tiago Tristão

Reputation: 65

How to create an indexed column based in other columns in a data.table?

I hava a data.table DF as following. Third coulmn is the team which have the ball possesion at that time (described by minutes and secs).

    minute second    teamId   isGoal     
 1:     10     22 Atletico MG      0
 2:     10     26 Atletico MG      0
 3:     10     30 Atletico MG      0
 4:     10     33 Atletico MG      0
 5:     10     35 Atletico MG      0
 6:     10     37 Atletico MG      0
 7:     10     38 Atletico MG      1
 8:     10     40 Atletico GO      0
 9:     10     42 Atletico GO      0
10:     10     48 Atletico GO      1
11:     10     51 Atletico MG      0
12:     10     54 Atletico MG      1
13:     10     60 Atletico MG      0

I would like to create twou new columns. Each one summing number of the goals of each team. For exemple, the output should be:

        minute second    teamId   isGoal  AtleticoMG AtleticoGO    
 1:     10     22 Atletico MG      0               0          0
 2:     10     26 Atletico MG      0               0          0
 3:     10     30 Atletico MG      0               0          0
 4:     10     33 Atletico MG      0               0          0
 5:     10     35 Atletico MG      0               0          0
 6:     10     37 Atletico MG      0               0          0
 7:     10     38 Atletico MG      1               1          0
 8:     10     40 Atletico GO      0               1          0
 9:     10     42 Atletico GO      0               1          0
10:     10     48 Atletico GO      1               1          1
11:     10     51 Atletico MG      0               1          1
11:     10     51 Atletico MG      0               1          1
12:     10     54 Atletico MG      1               2          1
13:     10     60 Atletico MG      0               2          1

A want to avoid for loops. I'm sure it must easy to do in data.table, but how?

Upvotes: 0

Views: 40

Answers (1)

bmosov01
bmosov01

Reputation: 599

Here is a solution using dplyr and a "toy" example that is similar to the data frame you give in the opening post.

Basically, basically you want a column for each team with indicators that are one if that team has scored a goal in a particular row of the dataframe. Then you use the cumsum() function to cumulate within these newly-generated columns.

library(dplyr)
x <- data.frame( teamID=c('A', 'A', 'B', 'A', 'A', 'B', 'B', 'B', 'A'),
                 isGoal=c(0,0,1,0,1,0,0,1, 0) ) %>%
mutate( AGoal = cumsum( isGoal*(teamID=='A') ),
        BGoal = cumsum( isGoal*(teamID=='B') ) )

Output:

teamID isGoal AGoal BGoal
A      0      0     0
A      0      0     0
B      1      0     1
A      0      0     1
A      1      1     1
B      0      1     1
B      0      1     1
B      1      1     2
A      0      1     2

Upvotes: 1

Related Questions