Reputation: 1271
I am looking at time series data and trying to identify historical maximums.
I am trying to do this by iterating over a vector and checking if the value I am looking at is greater than or equal to the max of the data up to this point. I can write a function for this, but I am struggling when I want to apply it to a grouped data frame.
Here is an example:
set.seed(32)
x <- data.frame(time = c(1:6),
value = runif(6))
> x
time value
1 1 0.5058405
2 2 0.5948084
3 3 0.8087471
4 4 0.7288197
5 5 0.1519876
6 6 0.9561873
#write a function to identify the records
#function takes an index
#checks whether the number at that index is greater than or equal to the maximum of the preceding values to that index
max_v <- function(index) {
output <- x$value[index] >= max(x$value[1:index])
output
}
#create the record variable
x$record <- sapply(1:nrow(x), max_v)
> x
time value record
1 1 0.5058405 TRUE
2 2 0.5948084 TRUE
3 3 0.8087471 TRUE
4 4 0.7288197 FALSE
5 5 0.1519876 FALSE
6 6 0.9561873 TRUE
The function works well. However the challenge I am facing is that I want to apply this to a data frame grouped by the type
variable created below:
set.seed(32)
x <- data.frame(time = rep(c(1:6),2),
type = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
value = runif(12))
> x
time type value
1 1 1 0.5058405
2 2 1 0.5948084
3 3 1 0.8087471
4 4 1 0.7288197
5 5 1 0.1519876
6 6 1 0.9561873
7 1 2 0.7535377
8 2 2 0.8520623
9 3 2 0.6734418
10 4 2 0.3871255
11 5 2 0.6580025
12 6 2 0.3213696
What I want is:
> x
time type value record
1 1 1 0.5058405 TRUE
2 2 1 0.5948084 TRUE
3 3 1 0.8087471 TRUE
4 4 1 0.7288197 FALSE
5 5 1 0.1519876 FALSE
6 6 1 0.9561873 TRUE
7 1 2 0.7535377 TRUE
8 2 2 0.8520623 TRUE
9 3 2 0.6734418 FALSE
10 4 2 0.3871255 FALSE
11 5 2 0.6580025 FALSE
12 6 2 0.3213696 FALSE
I have tried group_map
and tapply
, but I can't seem to get intelligible results, as I don't know how to pass the vector of indexes that I want to apply/map over.
Upvotes: 0
Views: 37
Reputation: 887158
We can use split/unsplit
in base R
i.e. by slightly modifying the OP's function as x
is hardcoded into the function. Instead, pass a new argument dat
max_v <- function(dat, index) {
output <- dat$value[index] >= max(dat$value[1:index])
output
}
x$record <- unsplit(lapply(split(x, x$type), function(y)
sapply(seq_len(nrow(y)), function(u) max_v(y, u))), x$type)
-output
x
time type value record
1 1 1 0.5058405 TRUE
2 2 1 0.5948084 TRUE
3 3 1 0.8087471 TRUE
4 4 1 0.7288197 FALSE
5 5 1 0.1519876 FALSE
6 6 1 0.9561873 TRUE
7 1 2 0.7535377 TRUE
8 2 2 0.8520623 TRUE
9 3 2 0.6734418 FALSE
10 4 2 0.3871255 FALSE
11 5 2 0.6580025 FALSE
12 6 2 0.3213696 FALSE
Or using data.table
with cummax
(similar logic as in @27 ϕ 9)
library(data.table)
setDT(x)[, record := cummax(value) == value, type]
Upvotes: 1
Reputation: 79228
Another Base R option:
x <- transform(x, record = unlist(tapply(value, type, FUN = cummax)) == value)
x
time type value record
1 1 1 0.5058405 TRUE
2 2 1 0.5948084 TRUE
3 3 1 0.8087471 TRUE
4 4 1 0.7288197 FALSE
5 5 1 0.1519876 FALSE
6 6 1 0.9561873 TRUE
7 1 2 0.7535377 TRUE
8 2 2 0.8520623 TRUE
9 3 2 0.6734418 FALSE
10 4 2 0.3871255 FALSE
11 5 2 0.6580025 FALSE
12 6 2 0.3213696 FALSE
You could also do:
x %>%
group_by(type) %>%
mutate(record = cummax(value)==value)
Upvotes: 1
Reputation: 34461
You can compare grouped value against the cumulative max.
x$record <- as.logical(with(x, ave(value, type, FUN = \(v) v == cummax(v))))
x
time type value record
1 1 1 0.5058405 TRUE
2 2 1 0.5948084 TRUE
3 3 1 0.8087471 TRUE
4 4 1 0.7288197 FALSE
5 5 1 0.1519876 FALSE
6 6 1 0.9561873 TRUE
7 1 2 0.7535377 TRUE
8 2 2 0.8520623 TRUE
9 3 2 0.6734418 FALSE
10 4 2 0.3871255 FALSE
11 5 2 0.6580025 FALSE
12 6 2 0.3213696 FALSE
Upvotes: 2