Louise
Louise

Reputation: 151

Finding the max and minimum value for a daily series in R?

I have hourly data for a year that I would like to find the maximum and minimum values per day. How would I go about doing this while retaining the time data associated with the max/min values? My goal is to produce a graph that smooths out the datapoints in between the max/min values, so I would need to keep the timestamp column information associated with the max/min values.

timestamp        & VALUES   &  &  &  \\
2016-01-01 0:00  & \#VALUE! &  &  &  \\
2016-01-01 1:00  & 2        &  &  &  \\
2016-01-01 2:00  & 0.5      &  &  &  \\
2016-01-01 3:00  & -1       &  &  &  \\
2016-01-01 4:00  & -2       &  &  &  \\
2016-01-01 5:00  & 4        &  &  &  \\
2016-01-01 6:00  & 2        &  &  &  \\
2016-01-01 7:00  & 0        &  &  &  \\
2016-01-01 8:00  & 5        &  &  &  \\
2016-01-01 9:00  & 61.5     &  &  &  \\
2016-01-01 10:00 & 19       &  &  &  \\
2016-01-01 11:00 & 3.5      &  &  &  \\
2016-01-01 12:00 & -1.5     &  &  &  \\
2016-01-01 13:00 & 9        &  &  &  \\
2016-01-01 14:00 & 0.5      &  &  &  \\
2016-01-01 15:00 & 0        &  &  &  \\
2016-01-01 16:00 & -8       &  &  &  \\
2016-01-01 17:00 & 7.5      &  &  &  \\
2016-01-01 18:00 & -9       &  &  &  \\
2016-01-01 19:00 & -80.5    &  &  &  \\
2016-01-01 20:00 & -9       &  &  &  \\
2016-01-01 21:00 & -0.5     &  &  &  \\
2016-01-01 22:00 & -0.5     &  &  &  \\
2016-01-01 23:00 & -2       &  &  & 

Thanks in advance!

Upvotes: 0

Views: 1232

Answers (3)

Rostom
Rostom

Reputation: 1

A tidyverse solution assuming that your data frame is called df

library(dplyr)
library(lubridate)

result <- df %>%
  mutate(timestamp = as.POSIXct(timestamp), 
         Date = lubridate::date(timestamp)) %>%
  arrange(Date, VALUES) %>%
  group_by(Date) %>%
  summarise(., across(VALUES, max, .names = "max_{.col}",
               across(VALUES, min, .names = "min_{.col}")

Upvotes: 0

jay.sf
jay.sf

Reputation: 72813

You may use strftime in ave. For sake of brevity, I make an example wit 4-hourly data

r <- transform(dat, min=ave(values, strftime(timestamp, '%F'), FUN=min),
               max=ave(values, strftime(timestamp, '%F'), FUN=max))
r
#           timestamp values min max
# 1   2016-01-01 0:00    -27 -66  13
# 2   2016-01-01 4:00    -32 -66  13
# 3   2016-01-01 8:00     13 -66  13
# 4  2016-01-01 12:00    -52 -66  13
# 5  2016-01-01 16:00    -66 -66  13
# 6  2016-01-01 20:00     12 -66  13
# 7   2016-01-02 0:00    -19 -53  19
# 8   2016-01-02 4:00     -8 -53  19
# 9   2016-01-02 8:00      8 -53  19
# 10 2016-01-02 12:00     18 -53  19
# 11 2016-01-02 16:00    -53 -53  19
# 12 2016-01-02 20:00     19 -53  19
# 13  2016-01-03 0:00     12 -74  42
# 14  2016-01-03 4:00     27 -74  42
# 15  2016-01-03 8:00    -74 -74  42
# 16 2016-01-03 12:00    -31 -74  42
# 17 2016-01-03 16:00     42 -74  42
# 18 2016-01-03 20:00    -62 -74  42

However, if there are missings in your data, you'll need an anonymus function.

dat[7, 2] <- NA

transform(
  dat, 
  min=ave(values, strftime(timestamp, '%F'), FUN=\(x) min(x, na.rm=TRUE)), 
  max=ave(values, strftime(timestamp, '%F'), FUN=\(x) max(x, na.rm=TRUE)))   
#           timestamp values min max
# 1   2016-01-01 0:00    -27 -66  13
# 2   2016-01-01 4:00    -32 -66  13
# 3   2016-01-01 8:00     13 -66  13
# 4  2016-01-01 12:00    -52 -66  13
# 5  2016-01-01 16:00    -66 -66  13
# 6  2016-01-01 20:00     12 -66  13
# 7   2016-01-02 0:00     NA -53  19
# 8   2016-01-02 4:00     -8 -53  19
# 9   2016-01-02 8:00      8 -53  19
# 10 2016-01-02 12:00     18 -53  19
# 11 2016-01-02 16:00    -53 -53  19
# 12 2016-01-02 20:00     19 -53  19
# 13  2016-01-03 0:00     12 -74  42
# 14  2016-01-03 4:00     27 -74  42
# 15  2016-01-03 8:00    -74 -74  42
# 16 2016-01-03 12:00    -31 -74  42
# 17 2016-01-03 16:00     42 -74  42
# 18 2016-01-03 20:00    -62 -74  42

Even a bit prettier would be a real timestamp in POSIXct format.

r <- transform(dat, timestamp=as.POSIXct(timestamp),
               min=ave(values, strftime(timestamp, '%F'), FUN=min),
               max=ave(values, strftime(timestamp, '%F'), FUN=max))
r
#              timestamp values min max
# 1  2016-01-01 00:00:00    -27 -66  13
# 2  2016-01-01 04:00:00    -32 -66  13
# 3  2016-01-01 08:00:00     13 -66  13
# 4  2016-01-01 12:00:00    -52 -66  13
# 5  2016-01-01 16:00:00    -66 -66  13
# 6  2016-01-01 20:00:00     12 -66  13
# 7  2016-01-02 00:00:00     NA  NA  NA
# 8  2016-01-02 04:00:00     -8  NA  NA
# 9  2016-01-02 08:00:00      8  NA  NA
# 10 2016-01-02 12:00:00     18  NA  NA
# 11 2016-01-02 16:00:00    -53  NA  NA
# 12 2016-01-02 20:00:00     19  NA  NA
# 13 2016-01-03 00:00:00     12 -74  42
# 14 2016-01-03 04:00:00     27 -74  42
# 15 2016-01-03 08:00:00    -74 -74  42
# 16 2016-01-03 12:00:00    -31 -74  42
# 17 2016-01-03 16:00:00     42 -74  42
# 18 2016-01-03 20:00:00    -62 -74  42

Data:

dat <- structure(list(timestamp = c("2016-01-01 0:00", "2016-01-01 2:00", 
"2016-01-01 4:00", "2016-01-01 6:00", "2016-01-01 8:00", "2016-01-01 10:00", 
"2016-01-01 12:00", "2016-01-01 14:00", "2016-01-01 16:00", "2016-01-01 18:00", 
"2016-01-01 20:00", "2016-01-01 22:00", "2016-01-02 0:00", "2016-01-02 2:00", 
"2016-01-02 4:00", "2016-01-02 6:00", "2016-01-02 8:00", "2016-01-02 10:00", 
"2016-01-02 12:00", "2016-01-02 14:00", "2016-01-02 16:00", "2016-01-02 18:00", 
"2016-01-02 20:00", "2016-01-02 22:00", "2016-01-03 0:00", "2016-01-03 2:00", 
"2016-01-03 4:00", "2016-01-03 6:00", "2016-01-03 8:00", "2016-01-03 10:00", 
"2016-01-03 12:00", "2016-01-03 14:00", "2016-01-03 16:00", "2016-01-03 18:00", 
"2016-01-03 20:00", "2016-01-03 22:00"), values = c(7, 12, NA, 
-70, -4, -22, -13, -76, 13, 45, 48, 55, -64, -30, -20, -8, -10, 
40, -32, 3, -67, -66, -74, -75, 57, 16, -31, -17, 9, 7, -66, 
13, 41, 58, 26, 58)), class = "data.frame", row.names = c(NA, 
-36L))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Convert timestamp to POSIXct, extract date from it and for each date keep the row that has max and min value.

library(dplyr)
library(lubridate)

result <- df %>%
  mutate(timestamp = mdy_hm(timestamp), 
         date = as.Date(timestamp)) %>%
  arrange(date, VALUES) %>%
  group_by(date) %>%
  slice(1, n())

Upvotes: 2

Related Questions