Reputation: 1271

Determine if next number in a time series is the max of time series so far (for grouped df)

I am looking at time series data and trying to identify historical maximums.

I am trying to do this by iterating over a vector and checking if the value I am looking at is greater than or equal to the max of the data up to this point. I can write a function for this, but I am struggling when I want to apply it to a grouped data frame.

Here is an example:

set.seed(32)
x <- data.frame(time = c(1:6), 
                value = runif(6))
> x
  time     value
1    1 0.5058405
2    2 0.5948084
3    3 0.8087471
4    4 0.7288197
5    5 0.1519876
6    6 0.9561873

#write a function to identify the records
#function takes an index 
#checks whether the number at that index is greater than or equal to the maximum of the preceding values to that index
max_v <- function(index) {
  output <- x$value[index] >= max(x$value[1:index])
  output
}

#create the record variable
x$record <- sapply(1:nrow(x), max_v)
 > x
  time     value record
1    1 0.5058405   TRUE
2    2 0.5948084   TRUE
3    3 0.8087471   TRUE
4    4 0.7288197  FALSE
5    5 0.1519876  FALSE
6    6 0.9561873   TRUE

The function works well. However the challenge I am facing is that I want to apply this to a data frame grouped by the type variable created below:

set.seed(32)
x <- data.frame(time = rep(c(1:6),2), 
                type = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
                value = runif(12))
> x
   time type     value
1     1    1 0.5058405
2     2    1 0.5948084
3     3    1 0.8087471
4     4    1 0.7288197
5     5    1 0.1519876
6     6    1 0.9561873
7     1    2 0.7535377
8     2    2 0.8520623
9     3    2 0.6734418
10    4    2 0.3871255
11    5    2 0.6580025
12    6    2 0.3213696

What I want is:

> x
   time type     value record
1     1    1 0.5058405   TRUE
2     2    1 0.5948084   TRUE
3     3    1 0.8087471   TRUE
4     4    1 0.7288197  FALSE
5     5    1 0.1519876  FALSE
6     6    1 0.9561873   TRUE
7     1    2 0.7535377   TRUE
8     2    2 0.8520623   TRUE
9     3    2 0.6734418  FALSE
10    4    2 0.3871255  FALSE
11    5    2 0.6580025  FALSE
12    6    2 0.3213696  FALSE

I have tried group_map and tapply, but I can't seem to get intelligible results, as I don't know how to pass the vector of indexes that I want to apply/map over.

Upvotes: 0

Answers (3)

akrun

Reputation: 887158

We can use split/unsplit in base R i.e. by slightly modifying the OP's function as x is hardcoded into the function. Instead, pass a new argument dat

max_v <- function(dat, index) {
  output <- dat$value[index] >= max(dat$value[1:index])
  output
}

x$record <- unsplit(lapply(split(x, x$type), function(y) 
         sapply(seq_len(nrow(y)), function(u) max_v(y, u))), x$type)

-output

x
   time type     value record
1     1    1 0.5058405   TRUE
2     2    1 0.5948084   TRUE
3     3    1 0.8087471   TRUE
4     4    1 0.7288197  FALSE
5     5    1 0.1519876  FALSE
6     6    1 0.9561873   TRUE
7     1    2 0.7535377   TRUE
8     2    2 0.8520623   TRUE
9     3    2 0.6734418  FALSE
10    4    2 0.3871255  FALSE
11    5    2 0.6580025  FALSE
12    6    2 0.3213696  FALSE

Or using data.table with cummax (similar logic as in @27 ϕ 9)

library(data.table)
setDT(x)[, record := cummax(value) == value, type]

Upvotes: 1

Onyambu

Reputation: 79228

Another Base R option:

x <- transform(x, record = unlist(tapply(value, type, FUN = cummax)) == value)
x
   time type     value record
1     1    1 0.5058405   TRUE
2     2    1 0.5948084   TRUE
3     3    1 0.8087471   TRUE
4     4    1 0.7288197  FALSE
5     5    1 0.1519876  FALSE
6     6    1 0.9561873   TRUE
7     1    2 0.7535377   TRUE
8     2    2 0.8520623   TRUE
9     3    2 0.6734418  FALSE
10    4    2 0.3871255  FALSE
11    5    2 0.6580025  FALSE
12    6    2 0.3213696  FALSE

You could also do:

x %>%
  group_by(type) %>%
  mutate(record = cummax(value)==value)

Upvotes: 1

lroha

Reputation: 34461

You can compare grouped value against the cumulative max.

x$record <- as.logical(with(x, ave(value, type, FUN = \(v) v == cummax(v))))
x

   time type     value record
1     1    1 0.5058405   TRUE
2     2    1 0.5948084   TRUE
3     3    1 0.8087471   TRUE
4     4    1 0.7288197  FALSE
5     5    1 0.1519876  FALSE
6     6    1 0.9561873   TRUE
7     1    2 0.7535377   TRUE
8     2    2 0.8520623   TRUE
9     3    2 0.6734418  FALSE
10    4    2 0.3871255  FALSE
11    5    2 0.6580025  FALSE
12    6    2 0.3213696  FALSE

Upvotes: 2

Determine if next number in a time series is the max of time series so far (for grouped df)

Answers (3)

Related Questions