Yus Ra
Yus Ra

Reputation: 107

Identify change in value over time with group_by, R

I'm trying to create a dummy based on if a variable within a group changes over time.

Here is my data:

Winner <- c("X", "Y", "Z", "Y", "Y", "Z")
year<-c("2019", "2019", "2019", "2019", "2020", "2020")
CPV <- c(0, 1, 1, 1,0,1)
test <- data.frame(Winner, year, CPV)
Winner    year     CPV
X         2019      0
Y         2019      1         
Z         2019      1
Y         2019      1
Y         2020      0
Z         2020      1

So, I would like to track if CPV changes between 2019 and 2020 for each Winner. If there is a change, a dummy variable change is 1, 0 if it remains the same, and NA if there are no observations for the next year.

Desired output:

Winner    year     CPV   change
X         2019      0       NA
Y         2019      1       1         
Z         2019      1       0
Y         2019      1       1
Y         2020      0       1
Z         2020      1       0

I've tried this code:

CPV_change<-test %>%
  group_by(Winner, year) %>%
  mutate(change = case_when(CPV != lag(CPV) ~ TRUE,TRUE ~ FALSE))

However, the use of the lag here isn't producing the output grouped by the variables Winner, year, but appears to be basing the change on sequential ordering.

Output that the above code produces:

Winner    year     CPV      change
X         2019      0       FALSE
Y         2019      1       TRUE         
Z         2019      1       TRUE
Y         2019      1       TRUE
Y         2020      0       FALSE
Z         2020      1       FALSE

Any ideas about how I can do this? Your help is much appreciated.

Upvotes: 1

Views: 1043

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389165

Using case_when :

library(dplyr)

test %>%
  arrange(Winner, year) %>%
  group_by(Winner) %>%
                            #No next year data
  mutate(change = case_when(n_distinct(year) == 1 ~ NA_real_, 
                            # If there is a change
                            any(diff(CPV) != 0) ~ 1, 
                            #No change
                            TRUE ~ 0)) %>%
   ungroup

 Winner year    CPV change
  <chr>  <chr> <dbl>  <dbl>
1 X      2019      0     NA
2 Y      2019      1      1
3 Y      2019      1      1
4 Y      2020      0      1
5 Z      2019      1      0
6 Z      2020      1      0

In data.table :

library(data.table)
setDT(test)[order(Winner, year), change := fcase(uniqueN(year) == 1, NA_real_,
                              any(diff(CPV) != 0), 1,
                              default = 0), Winner]

Upvotes: 3

Related Questions