Reputation: 41
I have this dataprocessing problem where I want to count the difference in goals between two teams in different matches. The data looks like this:
matchId teamId eventSec
1 2799331 6718 443.55984
2 2799331 6726 1152.62584
3 2799331 6718 2113.82370
4 2799331 6718 2066.28793
5 2799332 6710 1001.35775
6 2799332 6721 2469.32876
7 2799333 6734 744.35563
So one of these lines represents a goal scored by a team, and matchId represents the match, teamId the teams in the match and eventSec the time of the goal. What i want to accomplish is a variable for each row that shows the difference in score, so the first row in this example data should have a goal difference of 1, row two should have goal difference by 0, row three a goal difference by 1, etc.
I want to apply this to a big dataset with alot of different match and teamIds, so I need a pretty general solution to solve this. I've tried different dplyr solutions but gotten stuck a million times and would really like some help.
Thanks alot in advance!
Upvotes: 3
Views: 70
Reputation: 886948
We can also do this with indexing
library(dplyr)
df1 %>%
group_by(matchId) %>%
mutate(goal_diff =cumsum(c(-1, 1)[(teamId == first(teamId)) + 1 ]))
# A tibble: 7 x 4
# Groups: matchId [3]
# matchId teamId eventSec goal_diff
# <int> <int> <dbl> <dbl>
#1 2799331 6718 444. 1
#2 2799331 6726 1153. 0
#3 2799331 6718 2114. 1
#4 2799331 6718 2066. 2
#5 2799332 6710 1001. 1
#6 2799332 6721 2469. 0
#7 2799333 6734 744. 1
df1 <- structure(list(matchId = c(2799331L, 2799331L, 2799331L, 2799331L,
2799332L, 2799332L, 2799333L), teamId = c(6718L, 6726L, 6718L,
6718L, 6710L, 6721L, 6734L), eventSec = c(443.55984, 1152.62584,
2113.8237, 2066.28793, 1001.35775, 2469.32876, 744.35563)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
Upvotes: 2
Reputation: 30474
I hope I am following your logic correctly, let me know.
library(tidyverse)
df %>%
group_by(matchId) %>%
mutate(point = if_else(teamId == first(teamId), 1, -1),
goal_diff = cumsum(point))
In addition, if you want to get the score for each match, you could add:
group_by(matchId, teamId) %>%
summarise(score = sum(abs(point)))
Output
# A tibble: 7 x 5
# Groups: matchId [3]
matchId teamId eventSec point goal_diff
<int> <int> <dbl> <dbl> <dbl>
1 2799331 6718 444. 1 1
2 2799331 6726 1153. -1 0
3 2799331 6718 2114. 1 1
4 2799331 6718 2066. 1 2
5 2799332 6710 1001. 1 1
6 2799332 6721 2469. -1 0
7 2799333 6734 744. 1 1
Upvotes: 4
Reputation: 46898
Well, if your dataset is not like million rows, then try this:
x=structure(list(matchId = c(2799331L, 2799331L, 2799331L, 2799331L,
2799332L, 2799332L, 2799333L), teamId = c(6718L, 6726L, 6718L,
6718L, 6710L, 6721L, 6734L), eventSec = c(443.55984, 1152.62584,
2113.8237, 2066.28793, 1001.35775, 2469.32876, 744.35563)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
library(dplyr)
x %>% group_by(matchId) %>% mutate(goalD=cumsum(ifelse(teamId==teamId[1],1,-1)))
# A tibble: 7 x 4
# Groups: matchId [3]
matchId teamId eventSec goalD
<int> <int> <dbl> <dbl>
1 2799331 6718 444. 1
2 2799331 6726 1153. 0
3 2799331 6718 2114. 1
4 2799331 6718 2066. 2
5 2799332 6710 1001. 1
6 2799332 6721 2469. 0
7 2799333 6734 744. 1
Upvotes: 3