samrizz4
samrizz4

Reputation: 586

Add Value From One Cell to Another Based on Criteria From Different Column

I have this dataframe here:

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", 
"Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", 
"P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", 
"Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", 
"Intercepted Pass", "Bad Pass", "Good Pass"), Value = c(2, 2, 
2, -3, 2, 2, 2, 2, -2, 1, -2, 2), `Score Sum` = c(2, 4, 6, 3, 
2, 4, 6, 8, 6, 1, -1, 2)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

It contains 4 columns and 12 rows. In the 3rd column are values that were assigned based on the event. In the 4th column, I am trying to add the values to get a rolling sum. So for every time the Opponent had an event, their current value would be added to their previous score sum, and similar for P1/P2/P3. I have been able to roll the sums to how I expect it to be up until row 10.

I have the following code written here:

for (i in 1:nrow(smallerDF)) {
  #print(i)
  if (smallerDF$Event[i] == "Good Pass") {
    smallerDF$Value[i] <- 2
  }
  
  if (smallerDF$Event[i] == "Bad Pass") {
    smallerDF$Value[i] <- -2
  }
  
  if (smallerDF$Event[i] == "Intercepted Pass") {
    smallerDF$Value[i] <- 1
  }
  
  if (smallerDF$Event[i] == "Turnover") {
    smallerDF$Value[i] <- -3
  }
  
  if (smallerDF$category[i] == "Opponent") {
    #print(i)
    if (i != 1 && smallerDF$category[i-1] == "Opponent") {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
  else if (smallerDF$category[i] %in% dfList) {
    if (i != 1 && smallerDF$category[i-1] %in% dfList) {
      smallerDF$`Score Sum`[i] <- smallerDF$Value[i] + smallerDF$`Score Sum`[i-1]
    }
  }
}

This works up until row 10 since I am using [i-1], but I can't figure out how to get row 10 to reference back to row 4 (the last time Opponent was used) to add cell [10,3] onto cell [4,4].

The final result should look like

category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           4
11 Opponent Bad Pass            -2           2
12 P1       Good Pass            2           8

I tried incorporating the use of this code

dt <- data.table(smallerDF)
newDT <- dt[ , .SD[.N] ,  by = c("category") ]

but this only returns the very last row for each different value in category, and not the latest/previous occurrence of the category.

Any help would be greatly appreciated. Thanks

Upvotes: 1

Views: 642

Answers (2)

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8811

Here is a tidyverse solution

smallerDF %>% 
  #Removing original values from your data
  select(-Value,-`Score Sum`) %>% 
  #Creating Value variable with case_when
  mutate(
    Value = case_when(
      Event == "Good Pass" ~ 2,
      Event == "Bad Pass" ~ -2,
      Event == "Intercepted Pass" ~ 1,
      Event == "Turnover" ~ -3
    ),
    #Creating auxiliar logical variable (opponent or not oppponent)
    Opponent = if_else(category == "Opponent",TRUE,FALSE)
  ) %>% 
  #Creating cumulative sum by either Opponent or not oppponent
  group_by(Opponent) %>% 
  mutate(`Score sum` = cumsum(Value))

-output

 A tibble: 12 x 4
   category Event            Value `Score Sum`
   <chr>    <chr>            <dbl>       <dbl>
 1 Opponent Good Pass            2           2
 2 Opponent Good Pass            2           4
 3 Opponent Good Pass            2           6
 4 Opponent Turnover            -3           3
 5 P1       Good Pass            2           2
 6 P2       Good Pass            2           4
 7 P3       Good Pass            2           6
 8 P2       Good Pass            2           8
 9 P2       Bad Pass            -2           6
10 Opponent Intercepted Pass     1           1
11 Opponent Bad Pass            -2          -1
12 P1       Good Pass            2           2

Upvotes: 0

r2evans
r2evans

Reputation: 160407

I think the base premise here is a grouped-calculation (not easy in a for loop), and it should be grouped on whether category is "Opponnent" or not (lumping "P1", "P2", etc, together).

Data prep: starting with just the first two columns of your dataset above:

smallerDF <- structure(list(category = c("Opponent", "Opponent", "Opponent", "Opponent", "P1", "P2", "P3", "P2", "P2", "Opponent", "Opponent", "P1"), Event = c("Good Pass", "Good Pass", "Good Pass", "Turnover", "Good Pass", "Good Pass", "Good Pass", "Good Pass", "Bad Pass", "Intercepted Pass", "Bad Pass", "Good Pass")), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))

I'll add a "time" column: some utilities (e.g., base::merge) do not honor row-order, despite some best efforts. I think it's generally safer anyway to have a "time" component in there to remove accidental reordering. Neither the data.table nor dplyr solutions below reorder it inadvertently, but it's still not a horrible idea.

smallerDF$time <- seq_len(nrow(smallerDF))

base R

This is perhaps the least-intuitive of the three, since the grouping functions in R can seem daunting. These include ave, aggregate, by, tapply, etc. I'll stick with ave for now, as it is the simplest and perhaps easiest to read.

First, we'll create a "merge/join" table for Value (other ways exist to bring in these values, see https://stackoverflow.com/a/68999591/3358272; @ViníciusFélix's answer is a great example using case_when for this purpose). Second, we'll aggregate by "Opponent vs NotOpponent".

values <- data.frame(
  Event = c("Good Pass", "Bad Pass", "Intercepted Pass", "Turnover"),
  Value = c(2, -2, 1, -3)
)
smallerDF2 <- merge(smallerDF, values, by = "Event", all.x = TRUE, sort = FALSE)
## feel free to verify that `smallerDF2` is no longer in the original order,
## despite `sort=FALSE`. Order is not guaranteed with `base::merge`, period.
smallerDF2 <- smallerDF2[order(smallerDF2$time),]
smallerDF2
#               Event category time Value
# 1         Good Pass Opponent    1     2
# 2         Good Pass Opponent    2     2
# 3         Good Pass Opponent    3     2
# 9          Turnover Opponent    4    -3
# 5         Good Pass       P1    5     2
# 6         Good Pass       P2    6     2
# 7         Good Pass       P3    7     2
# 4         Good Pass       P2    8     2
# 10         Bad Pass       P2    9    -2
# 12 Intercepted Pass Opponent   10     1
# 11         Bad Pass Opponent   11    -2
# 8         Good Pass       P1   12     2
smallerDF2$`Score Sum2` <- ave(smallerDF2$Value, smallerDF2$category == "Opponent", FUN = cumsum)
smallerDF2
#               Event category time Value Score Sum2
# 1         Good Pass Opponent    1     2          2
# 2         Good Pass Opponent    2     2          4
# 3         Good Pass Opponent    3     2          6
# 9          Turnover Opponent    4    -3          3
# 5         Good Pass       P1    5     2          2
# 6         Good Pass       P2    6     2          4
# 7         Good Pass       P3    7     2          6
# 4         Good Pass       P2    8     2          8
# 10         Bad Pass       P2    9    -2          6
# 12 Intercepted Pass Opponent   10     1          4
# 11         Bad Pass Opponent   11    -2          2
# 8         Good Pass       P1   12     2          8

data.table

library(data.table)
smallerDT <- as.data.table(smallerDF)
smallerDT[values, Value := Value, on = .(Event)]
smallerDT[, `Score Sum2` := cumsum(Value), by = .(category == "Opponent")]

dplyr

library(dplyr)
left_join(smallerDF, values, by = "Event") %>%
  group_by(g = (category == "Opponent")) %>%
  mutate(`Score Sum` = cumsum(Value)) %>%
  ungroup() %>%
  select(-g)

Upvotes: 2

Related Questions