R dplyr does not finish lagged date difference computation

Question

I have a data frame such as this:

> bp
Source: local data frame [6 x 4]

        date amount accountId type
1 2015-06-11  101.2         1    a
2 2015-06-18  101.2         1    a
3 2015-06-24  101.2         1    b
4 2015-06-11  294.0         2    a
5 2015-06-18   48.0         2    a
6 2015-06-26   10.0         2    b

It has 3.4 million rows of data:

> nrow(bp)
[1] 3391874
>

I am trying to compute lagged differences of time in days as follows using dplyr:

bp <- bp %>% group_by(accountId) %>%
  mutate(diff = as.numeric(date - lag(date)))

On my 8GB memory macbook, R crashes. On a 64GB Linux server the code is taking forever. Any ideas on fixing this problem?

R dplyr does not finish lagged date difference computation

Answers (1)

Related Questions