Identify change in value over time with group_by, R

Question

I'm trying to create a dummy based on if a variable within a group changes over time.

Here is my data:

Winner <- c("X", "Y", "Z", "Y", "Y", "Z")
year<-c("2019", "2019", "2019", "2019", "2020", "2020")
CPV <- c(0, 1, 1, 1,0,1)
test <- data.frame(Winner, year, CPV)

Winner    year     CPV
X         2019      0
Y         2019      1         
Z         2019      1
Y         2019      1
Y         2020      0
Z         2020      1

So, I would like to track if CPV changes between 2019 and 2020 for each Winner. If there is a change, a dummy variable change is 1, 0 if it remains the same, and NA if there are no observations for the next year.

Desired output:

Winner    year     CPV   change
X         2019      0       NA
Y         2019      1       1         
Z         2019      1       0
Y         2019      1       1
Y         2020      0       1
Z         2020      1       0

I've tried this code:

CPV_change<-test %>%
  group_by(Winner, year) %>%
  mutate(change = case_when(CPV != lag(CPV) ~ TRUE,TRUE ~ FALSE))

However, the use of the lag here isn't producing the output grouped by the variables Winner, year, but appears to be basing the change on sequential ordering.

Output that the above code produces:

Winner    year     CPV      change
X         2019      0       FALSE
Y         2019      1       TRUE         
Z         2019      1       TRUE
Y         2019      1       TRUE
Y         2020      0       FALSE
Z         2020      1       FALSE

Any ideas about how I can do this? Your help is much appreciated.

Ronak Shah · Accepted Answer

Using case_when :

library(dplyr)

test %>%
  arrange(Winner, year) %>%
  group_by(Winner) %>%
                            #No next year data
  mutate(change = case_when(n_distinct(year) == 1 ~ NA_real_, 
                            # If there is a change
                            any(diff(CPV) != 0) ~ 1, 
                            #No change
                            TRUE ~ 0)) %>%
   ungroup

 Winner year    CPV change
       
1 X      2019      0     NA
2 Y      2019      1      1
3 Y      2019      1      1
4 Y      2020      0      1
5 Z      2019      1      0
6 Z      2020      1      0

In data.table :

library(data.table)
setDT(test)[order(Winner, year), change := fcase(uniqueN(year) == 1, NA_real_,
                              any(diff(CPV) != 0), 1,
                              default = 0), Winner]

Identify change in value over time with group_by, R

Answers (1)

Related Questions