Reputation: 63
I am thinking of getting max of (N-1) rows by group in R (but not including the current row). Not sure how should I start with since each group has dynamics number of rows. For example:
Thanks =)
Upvotes: 1
Views: 318
Reputation: 83275
A possible solution with the data.table
-package and the cummax
and shift
functions:
library(data.table)
setDT(df)[, lagged_max := shift(cummax(Count)), by = Group][]
which gives:
> df Group Count lagged_max 1: A 23 NA 2: A 12 23 3: A 145 23 4: B 123 NA 5: B 34 123 6: B 456 123 7: B 555 456
What this does:
setDT(df)
converts the dataframe df
to a data.table
.cummax(Count)
gets the cumulative maximum of Count
.shift
the result is shifted one position upward because the default values for the n
, type
& fill
parameters are 1L
, lag
an NA
respectively, Written in full it is: shift(cummax(Count), n = 1L, type = 'lag', fill = NA)
.The same logic in base R:
transform(df, lagged_max = ave(Count, Group, FUN = function(x) c(NA, head(cummax(x), -1))))
Or with dplyr
:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(lagged_max = lag(cummax(Count)))
Used data:
df <- data.frame(Group = c("A", "A", "A", "B", "B", "B", "B"),
Count = c(23, 12, 145, 123, 34, 456, 555))
Upvotes: 3
Reputation: 270065
Using DF
defined reproducibly in the Note at the end, we can use rollapply
to apply max
taking the maximum of all prior values where specifying a width of list(-seq(.N))
to rollapply
means to use offsets -1, -2, ..., -.N . partial=TRUE
says to use whatever number of values are available among the specified offsets even if some of the offsets are not available. Finally run it on each group using by=
.
This solution uses rollapply
and data.table, as requested, and involves direct specification of which offsets to use as opposed to shifting or lagging.
We could change max
to min
, sum
, mean
, median
or anything else that is appropriate if we wanted those. If we wanted to change this to just find the maximum of the prior k values then we could replace -seq(.N)
with -seq(k)
.
library(data.table)
library(zoo)
DT <- as.data.table(DF)
DT[, max := rollapply(Count, list(-seq(.N)), max, partial = TRUE, fill = NA), by = Group]
giving:
> DT
Group Count max
1: A 23 NA
2: A 12 23
3: A 145 23
4: B 123 NA
5: B 34 123
6: B 456 123
7: B 555 456
The input DF
in reproducible form is:
DF <- data.frame(Group = c("A", "A", "A", "B", "B", "B", "B"),
Count = c(23, 12, 145, 123, 34, 456, 555))
Fixed original and updated it with a shorter solution and one which uses data.table as per data.table tag on question.
Upvotes: 3
Reputation: 50728
Here is solution using dplyr::lag
:
# Sample data
df <- data.frame(
Group = c("A", "A", "A", "B", "B", "B", "B"),
Count = c(23, 12, 145, 123, 34, 456, 555))
library(tidyverse);
df %>%
group_by(Group) %>%
mutate(
n1 = lag(Count, 1),
n2 = lag(Count, 2),
max = ifelse(!is.na(n2) & !is.na(n1) & n2 > n1, n2, n1)) %>%
select(-n1, -n2)
## A tibble: 7 x 3
## Groups: Group [2]
# Group Count max
# <fct> <dbl> <dbl>
#1 A 23. NA
#2 A 12. 23.
#3 A 145. 23.
#4 B 123. NA
#5 B 34. 123.
#6 B 456. 123.
#7 B 555. 456.
Upvotes: 1