Reputation: 107
I'm trying to create a dummy based on if a variable within a group changes over time.
Here is my data:
Winner <- c("X", "Y", "Z", "Y", "Y", "Z")
year<-c("2019", "2019", "2019", "2019", "2020", "2020")
CPV <- c(0, 1, 1, 1,0,1)
test <- data.frame(Winner, year, CPV)
Winner year CPV
X 2019 0
Y 2019 1
Z 2019 1
Y 2019 1
Y 2020 0
Z 2020 1
So, I would like to track if CPV
changes between 2019 and 2020 for each Winner
. If there is a change, a dummy variable change
is 1, 0 if it remains the same, and NA if there are no observations for the next year.
Desired output:
Winner year CPV change
X 2019 0 NA
Y 2019 1 1
Z 2019 1 0
Y 2019 1 1
Y 2020 0 1
Z 2020 1 0
I've tried this code:
CPV_change<-test %>%
group_by(Winner, year) %>%
mutate(change = case_when(CPV != lag(CPV) ~ TRUE,TRUE ~ FALSE))
However, the use of the lag here isn't producing the output grouped by the variables Winner
, year
, but appears to be basing the change on sequential ordering.
Output that the above code produces:
Winner year CPV change
X 2019 0 FALSE
Y 2019 1 TRUE
Z 2019 1 TRUE
Y 2019 1 TRUE
Y 2020 0 FALSE
Z 2020 1 FALSE
Any ideas about how I can do this? Your help is much appreciated.
Upvotes: 1
Views: 1043
Reputation: 389165
Using case_when
:
library(dplyr)
test %>%
arrange(Winner, year) %>%
group_by(Winner) %>%
#No next year data
mutate(change = case_when(n_distinct(year) == 1 ~ NA_real_,
# If there is a change
any(diff(CPV) != 0) ~ 1,
#No change
TRUE ~ 0)) %>%
ungroup
Winner year CPV change
<chr> <chr> <dbl> <dbl>
1 X 2019 0 NA
2 Y 2019 1 1
3 Y 2019 1 1
4 Y 2020 0 1
5 Z 2019 1 0
6 Z 2020 1 0
In data.table
:
library(data.table)
setDT(test)[order(Winner, year), change := fcase(uniqueN(year) == 1, NA_real_,
any(diff(CPV) != 0), 1,
default = 0), Winner]
Upvotes: 3