user1165199
user1165199

Reputation: 6659

Find how long it has been since minimum value

I have a data frame in with a variable, value (which begins with NAs) and time at which that value is valid. For each row I want a new value determining the current cummin and how long it has been since that minimum.

I have the cummin part, but note the 'how long' part.

For example, here is my dataframe

example <- data.frame(variable = c('a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b'),
                      time = c(Sys.time(), Sys.time() + 5, Sys.time() + 15, Sys.time() + 34, Sys.time() + 51, Sys.time(), Sys.time() + 10, Sys.time() + 39, Sys.time() + 63, Sys.time() + 79, Sys.time() + 102, Sys.time() + 110, Sys.time() + 115),
                      value = c(NA, 3, 6, 2, 8, NA, NA, 1, 4, 3, 2, 0, 3))

I am currently using this to get the cummin of the value by the variable

cum_na.rm <- function(x, func){
    x[!is.na(x)] <- func(x[!is.na(x)])
    x
}
example %>% 
    group_by(variable) %>% 
    mutate(current_min = cum_na.rm(value, cummin))

I want to add something to get the time since the current_min to get something along the lines of the following:

   variable                time value current_min time_since_min
 1        a 2017-10-02 17:13:59    NA          NA             NA
 2        a 2017-10-02 17:14:04     3           3         0 secs
 3        a 2017-10-02 17:14:14     6           3        10 secs
 4        a 2017-10-02 17:14:33     2           2         0 secs
 5        a 2017-10-02 17:14:50     8           2        17 secs 
 6        b 2017-10-02 17:13:59    NA          NA             NA
 7        b 2017-10-02 17:14:09    NA          NA             NA
 8        b 2017-10-02 17:14:38     1           1         0 secs
 9        b 2017-10-02 17:15:02     4           1        24 secs
10        b 2017-10-02 17:15:18     3           1        40 secs
11        b 2017-10-02 17:15:41     2           1        63 secs
12        b 2017-10-02 17:15:49     0           0         0 secs
13        b 2017-10-02 17:15:54     3           0         5 secs

The actual dataframe is over 1,000,000 rows with lots of variables, so anything which is particularly slow may be problematic!

Any help would be appreciated

Upvotes: 2

Views: 52

Answers (1)

jeremycg
jeremycg

Reputation: 24965

You can add on a group_by for current_min, then add in the time - time[1].

You will get some weird results for NA, so you can remove them afterwards

example %>% 
  group_by(variable) %>% 
  mutate(current_min = cum_na.rm(value, cummin)) %>%
  group_by(current_min) %>%
  mutate(time_since_min = time - time[1])

Upvotes: 2

Related Questions