codez0mb1e
codez0mb1e

Reputation: 906

Sliding window on date-time field in data frame R

I want to calculate a some descriptive statistics:

Example of a data frame:

dt <- tibble(
    order_id = 1:10,
    customer_id = c(1, rep(2, 2), rep(3, 3), rep(4, 4)),
    amount = seq(10, 100, by = 10),
    date = c("2020-10-07 12:00", # 1st customer
             "2020-10-07 12:00", "2020-10-08 11:00", # 2st customer
             "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", # 3rd customer
             "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", "2020-10-08 21:00") # 4th customer
  ) %>% 
  mutate(
    date = lubridate::ymd_hm(date)
  )

Expected result:

order_id customer_id amount date                amount__n__last_24h amount__mean__last_24h
<int>       <dbl>  <dbl> <dttm>                            <dbl>                  <dbl>
1           1     10 2020-10-07 12:00:00                   0                     NA
2           2     20 2020-10-07 12:00:00                   0                     NA
3           2     30 2020-10-08 11:00:00                   1                     20
4           3     40 2020-10-07 12:00:00                   0                     NA
5           3     50 2020-10-08 11:00:00                   1                     40
6           3     60 2020-10-08 20:00:00                   1                     50
7           4     70 2020-10-07 12:00:00                   0                     NA
8           4     80 2020-10-08 11:00:00                   1                     70
9           4     90 2020-10-08 20:00:00                   1                     80
10          4    100 2020-10-08 21:00:00                   2                     85

How can I do this?

Upvotes: 0

Views: 250

Answers (1)

chinsoon12
chinsoon12

Reputation: 25223

Here is an option using data.table:

dt[, dayago := date - 24 * 60 * 60]
dt[, c("n", "avg") :=
    dt[dt, on=.(customer_id, date>=dayago, date<date),
        by=.EACHI, .(n=.N, avg=mean(amount))][, (1L:3L) := NULL]
    ]

data:

library(data.table)
dt <- data.table(
    order_id = 1:10,
    customer_id = c(1, rep(2, 2), rep(3, 3), rep(4, 4)),
    amount = seq(10, 100, by = 10),
    date = as.POSIXct(c("2020-10-07 12:00", # 1st customer
        "2020-10-07 12:00", "2020-10-08 11:00", # 2st customer
        "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", # 3rd customer
        "2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", "2020-10-08 21:00" # 4th customer
), format=("%Y-%m-%d %H:%M")))

Upvotes: 1

Related Questions