Reputation: 586
I have this dataframe here:
smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent",
"Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent",
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover",
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass",
"Intercepted Pass", "Bad Pass", "Good Pass"), Value = c(2, 2,
2, -3, 2, 2, 2, 2, -2, 1, -2, 2), `Score Sum` = c(2, 4, 6, 3,
2, 4, 6, 8, 6, 1, -1, 2)), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
It contains 4 columns and 12 rows. In the 3rd column are values that were assigned based on the event. In the 4th column, I am trying to add the values to get a rolling sum. So for every time the Opponent had an event, their current value would be added to their previous score sum, and similar for P1/P2/P3. I have been able to roll the sums to how I expect it to be up until row 10.
I have the following code written here:
for (i in 1:nrow(smallerDF)) {
#print(i)
if (smallerDF$Event[i] == "Good Pass") {
smallerDF$Value[i] <- 2
}
if (smallerDF$Event[i] == "Bad Pass") {
smallerDF$Value[i] <- -2
}
if (smallerDF$Event[i] == "Intercepted Pass") {
smallerDF$Value[i] <- 1
}
if (smallerDF$Event[i] == "Turnover") {
smallerDF$Value[i] <- -3
}
if (smallerDF$category[i] == "Opponent") {
#print(i)
if (i != 1 && smallerDF$category[i-1] == "Opponent") {
smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
}
}
else if (smallerDF$category[i] %in% dfList) {
if (i != 1 && smallerDF$category[i-1] %in% dfList) {
smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
}
}
}
This works up until row 10 since I am using [i-1], but I can't figure out how to get row 10 to reference back to row 4 (the last time Opponent was used) to add cell [10,3] onto cell [4,4].
The final result should look like
category Event Value `Score Sum`
<chr> <chr> <dbl> <dbl>
1 Opponent Good Pass 2 2
2 Opponent Good Pass 2 4
3 Opponent Good Pass 2 6
4 Opponent Turnover -3 3
5 P1 Good Pass 2 2
6 P2 Good Pass 2 4
7 P3 Good Pass 2 6
8 P2 Good Pass 2 8
9 P2 Bad Pass -2 6
10 Opponent Intercepted Pass 1 4
11 Opponent Bad Pass -2 2
12 P1 Good Pass 2 8
I tried incorporating the use of this code
dt <- data.table(smallerDF)
newDT <- dt[ , .SD[.N] , by = c("category") ]
but this only returns the very last row for each different value in category, and not the latest/previous occurrence of the category.
Any help would be greatly appreciated. Thanks
Upvotes: 1
Views: 642
Reputation: 8811
Here is a tidyverse
solution
smallerDF %>%
#Removing original values from your data
select(-Value,-`Score Sum`) %>%
#Creating Value variable with case_when
mutate(
Value = case_when(
Event == "Good Pass" ~ 2,
Event == "Bad Pass" ~ -2,
Event == "Intercepted Pass" ~ 1,
Event == "Turnover" ~ -3
),
#Creating auxiliar logical variable (opponent or not oppponent)
Opponent = if_else(category == "Opponent",TRUE,FALSE)
) %>%
#Creating cumulative sum by either Opponent or not oppponent
group_by(Opponent) %>%
mutate(`Score sum` = cumsum(Value))
-output
A tibble: 12 x 4
category Event Value `Score Sum`
<chr> <chr> <dbl> <dbl>
1 Opponent Good Pass 2 2
2 Opponent Good Pass 2 4
3 Opponent Good Pass 2 6
4 Opponent Turnover -3 3
5 P1 Good Pass 2 2
6 P2 Good Pass 2 4
7 P3 Good Pass 2 6
8 P2 Good Pass 2 8
9 P2 Bad Pass -2 6
10 Opponent Intercepted Pass 1 1
11 Opponent Bad Pass -2 -1
12 P1 Good Pass 2 2
Upvotes: 0
Reputation: 160407
I think the base premise here is a grouped-calculation (not easy in a for
loop), and it should be grouped on whether category
is "Opponnent"
or not (lumping "P1"
, "P2"
, etc, together).
Data prep: starting with just the first two columns of your dataset above:
smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", "Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", "P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", "Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", "Intercepted Pass", "Bad Pass", "Good Pass")), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))
I'll add a "time" column: some utilities (e.g., base::merge
) do not honor row-order, despite some best efforts. I think it's generally safer anyway to have a "time" component in there to remove accidental reordering. Neither the data.table
nor dplyr
solutions below reorder it inadvertently, but it's still not a horrible idea.
smallerDF$time <- seq_len(nrow(smallerDF))
This is perhaps the least-intuitive of the three, since the grouping functions in R can seem daunting. These include ave
, aggregate
, by
, tapply
, etc. I'll stick with ave
for now, as it is the simplest and perhaps easiest to read.
First, we'll create a "merge/join" table for Value
(other ways exist to bring in these values, see https://stackoverflow.com/a/68999591/3358272; @ViníciusFélix's answer is a great example using case_when
for this purpose). Second, we'll aggregate by "Opponent vs NotOpponent".
values <- data.frame(
Event = c("Good Pass", "Bad Pass", "Intercepted Pass", "Turnover"),
Value = c(2, -2, 1, -3)
)
smallerDF2 <- merge(smallerDF, values, by = "Event", all.x = TRUE, sort = FALSE)
## feel free to verify that `smallerDF2` is no longer in the original order,
## despite `sort=FALSE`. Order is not guaranteed with `base::merge`, period.
smallerDF2 <- smallerDF2[order(smallerDF2$time),]
smallerDF2
# Event category time Value
# 1 Good Pass Opponent 1 2
# 2 Good Pass Opponent 2 2
# 3 Good Pass Opponent 3 2
# 9 Turnover Opponent 4 -3
# 5 Good Pass P1 5 2
# 6 Good Pass P2 6 2
# 7 Good Pass P3 7 2
# 4 Good Pass P2 8 2
# 10 Bad Pass P2 9 -2
# 12 Intercepted Pass Opponent 10 1
# 11 Bad Pass Opponent 11 -2
# 8 Good Pass P1 12 2
smallerDF2$`Score Sum2` <- ave(smallerDF2$Value, smallerDF2$category == "Opponent", FUN = cumsum)
smallerDF2
# Event category time Value Score Sum2
# 1 Good Pass Opponent 1 2 2
# 2 Good Pass Opponent 2 2 4
# 3 Good Pass Opponent 3 2 6
# 9 Turnover Opponent 4 -3 3
# 5 Good Pass P1 5 2 2
# 6 Good Pass P2 6 2 4
# 7 Good Pass P3 7 2 6
# 4 Good Pass P2 8 2 8
# 10 Bad Pass P2 9 -2 6
# 12 Intercepted Pass Opponent 10 1 4
# 11 Bad Pass Opponent 11 -2 2
# 8 Good Pass P1 12 2 8
library(data.table)
smallerDT <- as.data.table(smallerDF)
smallerDT[values, Value := Value, on = .(Event)]
smallerDT[, `Score Sum2` := cumsum(Value), by = .(category == "Opponent")]
library(dplyr)
left_join(smallerDF, values, by = "Event") %>%
group_by(g = (category == "Opponent")) %>%
mutate(`Score Sum` = cumsum(Value)) %>%
ungroup() %>%
select(-g)
Upvotes: 2