How to get the difference of a lagged variable by date?

Consider the following example:

library(tidyverse)
library(lubridate)

df = tibble(client_id = rep(1:3, each=24),
            date = rep(seq(ymd("2016-01-01"), (ymd("2016-12-01") + years(1)), by='month'), 3),
            expenditure = runif(72))

In df you have stored information on monthly expenditure from a bunch of clients for the past 2 years. Now you want to calculate the monthly difference between this year and the previous year for each client.

Is there any way of doing this maintaining the "long" format of the dataset? Here I show you the way I am doing it nowadays, which implies going wide:

df2 = df %>% 
  mutate(date2 = paste0('val_',
                        year(date), 
                        formatC(month(date), width=2, flag="0"))) %>% 
  select(client_id, date2, value) %>% 
  pivot_wider(names_from = date2, 
              values_from = value)

df3 = (df2[,2:13] - df2[,14:25])

However I find tihs unnecessary complex, and in large datasets going from long to wide can take quite a lot of time, so I think there must be a better way of doing it.

Upvotes: 2

Views: 286

Answers (2)

akrun
akrun

Reputation: 886938

An option with data.table

library(data.table)
library(zoo)
setDT(df)[, .(diff = -diff(expenditure)), .(client_id, month_date = as.yearmon(date))]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

If you want to keep data in long format, one way would be to group by month and date value for each client_id and calculate the difference using diff.

library(dplyr)

df %>% 
  group_by(client_id, month_date = format(date, "%m-%d")) %>%
  summarise(diff = -diff(expenditure))

#   client_id month_date  diff
#       <int> <chr>       <dbl>
# 1         1 01-01       0.278  
# 2         1 02-01      -0.0421 
# 3         1 03-01       0.0117 
# 4         1 04-01      -0.0440 
# 5         1 05-01       0.855  
# 6         1 06-01       0.354  
# 7         1 07-01      -0.226  
# 8         1 08-01       0.506  
# 9         1 09-01       0.119  
#10         1 10-01       0.00819
# … with 26 more rows

Upvotes: 1

Related Questions