xiaodai
xiaodai

Reputation: 16014

R: How to the compute the maximum of a column's lead next few rows by group?

Consider

a = data.frame(order = runif(1e7), id = sample(1000, 1e7, replace=TRUE), num = runif(1e7))

I want to compute the max of the num led 1 to n rows group by id.

For example, in dplyr I can do below to compute the max of lead 1, 2, 3 rows

library(dplyr)
a2 = a %>%
  group_by(id) %>%
  arrange(order) %>%
  mutate(max_num = pmax(lead(num, 1), lead(num, 2), lead(num, 3), na.rm=T)) %>% 
  ungroup(id) %>% 
  arrange(id, order)

But this seems really inefficient and requires a lot of typing (for n large). Is there a more elegant and efficient way to achieve this?

Upvotes: 0

Views: 97

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

An option using data.table:

library(data.table)
n <- 3L
setDT(a, key=c("id", "order"))[, 
    max_num := do.call(pmax, c(shift(num, -n:-1L), list(na.rm=TRUE))), id]
a

output:

                 order   id       num   max_num
       1: 0.0001883428    1 0.8529981 0.7802871
       2: 0.0002661028    1 0.7802871 0.8935886
       3: 0.0003412480    1 0.1904755 0.8935886
       4: 0.0004550556    1 0.7339801 0.8935886
       5: 0.0004603295    1 0.8935886 0.6957475
      ---                                      
 9999996: 0.9996615215 1000 0.9226337 0.3970948
 9999997: 0.9997126870 1000 0.1964909 0.6950569
 9999998: 0.9997212037 1000 0.2761086 0.6950569
 9999999: 0.9998292420 1000 0.3970948 0.6950569
10000000: 0.9998487560 1000 0.6950569        NA

data:

library(data.table)
set.seed(0L)
nr <- 1e7
a = data.frame(order = runif(nr), id = sample(1000, nr, replace=TRUE), num = runif(nr))

Upvotes: 1

Related Questions