Modern_ibex
Modern_ibex

Reputation: 93

Dplyr mutate does not condition correctly on separate (date) column

I have a time-series dataset in which I would like to refer to a value in a previous year, if it exists. I therefore create a helper column with the date I am referring to and expect the code to retrieve the value from that exact year. However, this is not happening, rather it retrieves the same value in all rows, corresponding to the first possible year.

I use the following code

library(dplyr)
library(lubridate)

dataset <- data.frame(names=c("a","a","a","a","a","a"),
                      values=c(2,3,4,5,6,7),
                      dates=dmy(c("01/01/2010","01/01/2011","01/01/2012","01/01/2013","01/01/2014","01/01/2015")))

dataset_calc <- dataset %>% 
  group_by(names) %>% 
  mutate(yoy=case_when(dates>=dmy("01/01/2011") ~ dates-years(1),
                       TRUE ~ dmy("01/01/2010"))) %>% 
  ungroup()

final <- dataset_calc  %>% 
 mutate(yoyval= values[dates==yoy])

But get this result:

  names values dates      yoy        yoyval
  <chr>  <dbl> <date>     <date>      <dbl>
1 a          2 2010-01-01 2010-01-01      2
2 a          3 2011-01-01 2010-01-01      2
3 a          4 2012-01-01 2011-01-01      2
4 a          5 2013-01-01 2012-01-01      2
5 a          6 2014-01-01 2013-01-01      2
6 a          7 2015-01-01 2014-01-01      2

Where I'd expect to get the following:

  names values dates      yoy        yoyval
  <chr>  <dbl> <date>     <date>      <dbl>
1 a          2 2010-01-01 2010-01-01      2
2 a          3 2011-01-01 2010-01-01      2
3 a          4 2012-01-01 2011-01-01      3
4 a          5 2013-01-01 2012-01-01      4
5 a          6 2014-01-01 2013-01-01      5
6 a          7 2015-01-01 2014-01-01      6

Am I not using the conditioning in the square brackets correctly, or do I misunderstand how mutate uses the condition?

Upvotes: 0

Views: 191

Answers (1)

pieterbons
pieterbons

Reputation: 1724

You can do a left_join on the new column yoy with the original dates to find the corresponding values for each year:

final <- dataset_calc  %>% 
  left_join(dataset_calc %>% select(values, dates), by = c('yoy' = 'dates'))

This gives the desired result.

If you want to incorporate the action in the existing pipeline, you can reference the intermediate result with eval(.) and perform a sort-of self join:


dataset_calc <- dataset %>% 
  group_by(names) %>% 
  mutate(yoy=case_when(dates>=dmy("01/01/2011") ~ dates-years(1),
                       TRUE ~ dmy("01/01/2010"))) %>% 
  ungroup() %>% 
  left_join(eval(.)  %>% select(values, dates), by = c('yoy' = 'dates'))

Upvotes: 1

Related Questions