Reputation: 173
I am looking to add a new column to my current data frame which adds a new sequencing number based on a string of events in a football match.
This my current data frame
head(test_P)
index team.name possession_team.name minute second period possession type.name
1 5 Cardiff City Cardiff City 0 0 1 2 Pass
2 6 Cardiff City Cardiff City 0 2 1 2 Ball Receipt*
3 7 Cardiff City Cardiff City 0 2 1 2 Carry
4 8 Cardiff City Cardiff City 0 3 1 2 Pass
5 9 Cardiff City Cardiff City 0 6 1 2 Ball Receipt*
6 10 Preston North End Cardiff City 0 6 1 2 Duel
7 11 Preston North End Cardiff City 0 6 1 2 Pass
8 12 Preston North End Cardiff City 0 8 1 2 Miscontrol
9 13 Cardiff City Cardiff City 0 8 1 2 Pass
10 14 Cardiff City Cardiff City 0 9 1 2 Ball Receipt*
11 15 Cardiff City Cardiff City 0 9 1 2 Cross
12 16 Preston North End Cardiff City 0 10 1 2 Clearance
13 17 Cardiff City Cardiff City 0 11 1 2 Pass
14 18 Cardiff City Cardiff City 0 13 1 2 Ball Receipt*
15 19 Preston North End Preston North End 0 13 1 3 Ball Recovery
16 20 Preston North End Preston North End 0 13 1 3 Carry
17 21 Preston North End Preston North End 0 21 1 3 Pass
18 22 Preston North End Preston North End 0 22 1 3 Ball Receipt*.
However, I want to add an additional column name after possession called sequence which labels the sequence number of a possession.
Every new possession should start with sequence with value of 1
But if the opposition breaks that sequence with an event/events and the possession value is still the same, the next time the possession team touches the ball it should be a new sequence number e.g 2 or if multiple breaks 3,4 etc
The opposition events should be coded with the same sequence number as the one to which they have broke
E.g Data Below
index team.name possession_team.name minute second period possession type.name sequence
1 5 Cardiff City Cardiff City 0 0 1 2 Pass 1
2 6 Cardiff City Cardiff City 0 2 1 2 Ball Receipt 1
3 7 Cardiff City Cardiff City 0 2 1 2 Carry 1
4 8 Cardiff City Cardiff City 0 3 1 2 Pass 1
5 9 Cardiff City Cardiff City 0 6 1 2 Ball Receipt* 1
6 10 Preston North End Cardiff City 0 6 1 2 Duel 1
7 11 Preston North End Cardiff City 0 6 1 2 Pass 1
8 12 Preston North End Cardiff City 0 8 1 2 Miscontrol 1
9 13 Cardiff City Cardiff City 0 8 1 2 Pass 2
10 14 Cardiff City Cardiff City 0 9 1 2 Ball Receipt 2
11 15 Cardiff City Cardiff City 0 9 1 2 Cross 2
12 16 Preston North End Cardiff City 0 10 1 2 Clearance 2
13 17 Cardiff City Cardiff City 0 11 1 2 Pass 3
14 18 Cardiff City Cardiff City 0 13 1 2 Ball Receipt 3
15 19 Preston North End Preston North End 0 13 1 3 Ball Recovery 1
16 20 Preston North End Preston North End 0 13 1 3 Carry 1
17 21 Preston North End Preston North End 0 21 1 3 Pass 1
18 22 Preston North End Preston North End 0 22 1 3 Ball Receipt 1
I have tried lead and lag functions combined with ifelse statements but can't seem to get the data to work
test <- test %>% mutate(P = ifelse(dplyr::lag(team.id)!=team.id & dplyr::lag(possession) == possession, dplyr::lag(seq_id) + 1,
ifelse(dplyr::lead(team.id)!=team.id & dplyr::lead(possession)!=possession , seq_id, 1)))
Any help would be greatly appreciated and apologies for the untidiness of this question
Upvotes: 0
Views: 60
Reputation: 2374
The following feels very hacky, but might do the job.
The logic is the following:
flip
variable that is 1/2 every time team.name "flips" and 0 otherwise.cum_sum_flip
, the cumulative sum over flip
. Added 1 so that it starts at 1 and not 0.sequence
by taking the floor()
from cum_sum_flip
so that at every second flip, the sequence is increased.Notes:
match
or something to make sure that when a whole new match starts it starts counting from 0 again.library(dplyr)
test_P %>%
mutate(flip = (lag(team.name) != team.name) %>% replace_na(0) * 1/2,
.after = possession
) %>% group_by(possession) %>%
mutate(cum_sum_flip = cumsum(flip)+1,
sequence = floor(cum_sum_flip),
.after = possession
)
Results:
# A tibble: 18 x 11
# Groups: possession [2]
index team.name possession_team.name minute second period possession cum_sum_flip sequence flip type.name
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 5 Cardiff City Cardiff City 0 0 1 2 1 1 0 Pass
2 6 Cardiff City Cardiff City 0 2 1 2 1 1 0 Ball Receipt*
3 7 Cardiff City Cardiff City 0 2 1 2 1 1 0 Carry
4 8 Cardiff City Cardiff City 0 3 1 2 1 1 0 Pass
5 9 Cardiff City Cardiff City 0 6 1 2 1 1 0 Ball Receipt*
6 10 Preston North End Cardiff City 0 6 1 2 1.5 1 0.5 Duel
7 11 Preston North End Cardiff City 0 6 1 2 1.5 1 0 Pass
8 12 Preston North End Cardiff City 0 8 1 2 1.5 1 0 Miscontrol
9 13 Cardiff City Cardiff City 0 8 1 2 2 2 0.5 Pass
10 14 Cardiff City Cardiff City 0 9 1 2 2 2 0 Ball Receipt*
11 15 Cardiff City Cardiff City 0 9 1 2 2 2 0 Cross
12 16 Preston North End Cardiff City 0 10 1 2 2.5 2 0.5 Clearance
13 17 Cardiff City Cardiff City 0 11 1 2 3 3 0.5 Pass
14 18 Cardiff City Cardiff City 0 13 1 2 3 3 0 Ball Receipt*
15 19 Preston North End Preston North End 0 13 1 3 1.5 1 0.5 Ball Recovery
16 20 Preston North End Preston North End 0 13 1 3 1.5 1 0 Carry
17 21 Preston North End Preston North End 0 21 1 3 1.5 1 0 Pass
18 22 Preston North End Preston North End 0 22 1 3 1.5 1 0 Ball Receipt*
test_P <- tribble(
~index, ~team.name, ~possession_team.name, ~minute, ~second, ~period, ~possession, ~type.name,
5 , "Cardiff City", "Cardiff City", 0, 0, 1, 2, "Pass",
6 , "Cardiff City", "Cardiff City", 0, 2, 1, 2, "Ball Receipt*",
7 , "Cardiff City", "Cardiff City", 0, 2, 1, 2, "Carry",
8 , "Cardiff City", "Cardiff City", 0, 3, 1, 2, "Pass",
9 , "Cardiff City", "Cardiff City", 0, 6, 1, 2, "Ball Receipt*",
10, "Preston North End", "Cardiff City", 0, 6, 1, 2, "Duel",
11, "Preston North End", "Cardiff City", 0, 6, 1, 2, "Pass",
12, "Preston North End", "Cardiff City", 0, 8, 1, 2, "Miscontrol",
13, "Cardiff City", "Cardiff City", 0, 8, 1, 2, "Pass",
14, "Cardiff City", "Cardiff City", 0, 9, 1, 2, "Ball Receipt*",
15, "Cardiff City", "Cardiff City", 0, 9, 1, 2, "Cross",
16, "Preston North End", "Cardiff City", 0, 10, 1, 2, "Clearance",
17, "Cardiff City", "Cardiff City", 0, 11, 1, 2, "Pass",
18, "Cardiff City", "Cardiff City", 0, 13, 1, 2, "Ball Receipt*",
19, "Preston North End", "Preston North End", 0, 13, 1, 3, "Ball Recovery",
20, "Preston North End", "Preston North End", 0, 13, 1, 3, "Carry",
21, "Preston North End", "Preston North End", 0, 21, 1, 3, "Pass",
22, "Preston North End", "Preston North End", 0, 22, 1, 3, "Ball Receipt*")
Upvotes: 1