Reputation: 16014
Consider
a = data.frame(order = runif(1e7), id = sample(1000, 1e7, replace=TRUE), num = runif(1e7))
I want to compute the max
of the num
led 1 to n
rows group by id
.
For example, in dplyr I can do below to compute the max of lead 1, 2, 3 rows
library(dplyr)
a2 = a %>%
group_by(id) %>%
arrange(order) %>%
mutate(max_num = pmax(lead(num, 1), lead(num, 2), lead(num, 3), na.rm=T)) %>%
ungroup(id) %>%
arrange(id, order)
But this seems really inefficient and requires a lot of typing (for n
large). Is there a more elegant and efficient way to achieve this?
Upvotes: 0
Views: 97
Reputation: 25225
An option using data.table
:
library(data.table)
n <- 3L
setDT(a, key=c("id", "order"))[,
max_num := do.call(pmax, c(shift(num, -n:-1L), list(na.rm=TRUE))), id]
a
output:
order id num max_num
1: 0.0001883428 1 0.8529981 0.7802871
2: 0.0002661028 1 0.7802871 0.8935886
3: 0.0003412480 1 0.1904755 0.8935886
4: 0.0004550556 1 0.7339801 0.8935886
5: 0.0004603295 1 0.8935886 0.6957475
---
9999996: 0.9996615215 1000 0.9226337 0.3970948
9999997: 0.9997126870 1000 0.1964909 0.6950569
9999998: 0.9997212037 1000 0.2761086 0.6950569
9999999: 0.9998292420 1000 0.3970948 0.6950569
10000000: 0.9998487560 1000 0.6950569 NA
data:
library(data.table)
set.seed(0L)
nr <- 1e7
a = data.frame(order = runif(nr), id = sample(1000, nr, replace=TRUE), num = runif(nr))
Upvotes: 1