Reputation: 93
I have a time-series dataset in which I would like to refer to a value in a previous year, if it exists. I therefore create a helper column with the date I am referring to and expect the code to retrieve the value from that exact year. However, this is not happening, rather it retrieves the same value in all rows, corresponding to the first possible year.
I use the following code
library(dplyr)
library(lubridate)
dataset <- data.frame(names=c("a","a","a","a","a","a"),
values=c(2,3,4,5,6,7),
dates=dmy(c("01/01/2010","01/01/2011","01/01/2012","01/01/2013","01/01/2014","01/01/2015")))
dataset_calc <- dataset %>%
group_by(names) %>%
mutate(yoy=case_when(dates>=dmy("01/01/2011") ~ dates-years(1),
TRUE ~ dmy("01/01/2010"))) %>%
ungroup()
final <- dataset_calc %>%
mutate(yoyval= values[dates==yoy])
But get this result:
names values dates yoy yoyval
<chr> <dbl> <date> <date> <dbl>
1 a 2 2010-01-01 2010-01-01 2
2 a 3 2011-01-01 2010-01-01 2
3 a 4 2012-01-01 2011-01-01 2
4 a 5 2013-01-01 2012-01-01 2
5 a 6 2014-01-01 2013-01-01 2
6 a 7 2015-01-01 2014-01-01 2
Where I'd expect to get the following:
names values dates yoy yoyval
<chr> <dbl> <date> <date> <dbl>
1 a 2 2010-01-01 2010-01-01 2
2 a 3 2011-01-01 2010-01-01 2
3 a 4 2012-01-01 2011-01-01 3
4 a 5 2013-01-01 2012-01-01 4
5 a 6 2014-01-01 2013-01-01 5
6 a 7 2015-01-01 2014-01-01 6
Am I not using the conditioning in the square brackets correctly, or do I misunderstand how mutate uses the condition?
Upvotes: 0
Views: 191
Reputation: 1724
You can do a left_join on the new column yoy with the original dates to find the corresponding values for each year:
final <- dataset_calc %>%
left_join(dataset_calc %>% select(values, dates), by = c('yoy' = 'dates'))
This gives the desired result.
If you want to incorporate the action in the existing pipeline, you can reference the intermediate result with eval(.)
and perform a sort-of self join:
dataset_calc <- dataset %>%
group_by(names) %>%
mutate(yoy=case_when(dates>=dmy("01/01/2011") ~ dates-years(1),
TRUE ~ dmy("01/01/2010"))) %>%
ungroup() %>%
left_join(eval(.) %>% select(values, dates), by = c('yoy' = 'dates'))
Upvotes: 1