Reputation: 151
I have hourly data for a year that I would like to find the maximum and minimum values per day. How would I go about doing this while retaining the time data associated with the max/min values? My goal is to produce a graph that smooths out the datapoints in between the max/min values, so I would need to keep the timestamp column information associated with the max/min values.
timestamp & VALUES & & & \\
2016-01-01 0:00 & \#VALUE! & & & \\
2016-01-01 1:00 & 2 & & & \\
2016-01-01 2:00 & 0.5 & & & \\
2016-01-01 3:00 & -1 & & & \\
2016-01-01 4:00 & -2 & & & \\
2016-01-01 5:00 & 4 & & & \\
2016-01-01 6:00 & 2 & & & \\
2016-01-01 7:00 & 0 & & & \\
2016-01-01 8:00 & 5 & & & \\
2016-01-01 9:00 & 61.5 & & & \\
2016-01-01 10:00 & 19 & & & \\
2016-01-01 11:00 & 3.5 & & & \\
2016-01-01 12:00 & -1.5 & & & \\
2016-01-01 13:00 & 9 & & & \\
2016-01-01 14:00 & 0.5 & & & \\
2016-01-01 15:00 & 0 & & & \\
2016-01-01 16:00 & -8 & & & \\
2016-01-01 17:00 & 7.5 & & & \\
2016-01-01 18:00 & -9 & & & \\
2016-01-01 19:00 & -80.5 & & & \\
2016-01-01 20:00 & -9 & & & \\
2016-01-01 21:00 & -0.5 & & & \\
2016-01-01 22:00 & -0.5 & & & \\
2016-01-01 23:00 & -2 & & &
Thanks in advance!
Upvotes: 0
Views: 1232
Reputation: 1
A tidyverse
solution assuming that your data frame is called df
library(dplyr)
library(lubridate)
result <- df %>%
mutate(timestamp = as.POSIXct(timestamp),
Date = lubridate::date(timestamp)) %>%
arrange(Date, VALUES) %>%
group_by(Date) %>%
summarise(., across(VALUES, max, .names = "max_{.col}",
across(VALUES, min, .names = "min_{.col}")
Upvotes: 0
Reputation: 72813
You may use strftime
in ave
. For sake of brevity, I make an example wit 4-hourly data
r <- transform(dat, min=ave(values, strftime(timestamp, '%F'), FUN=min),
max=ave(values, strftime(timestamp, '%F'), FUN=max))
r
# timestamp values min max
# 1 2016-01-01 0:00 -27 -66 13
# 2 2016-01-01 4:00 -32 -66 13
# 3 2016-01-01 8:00 13 -66 13
# 4 2016-01-01 12:00 -52 -66 13
# 5 2016-01-01 16:00 -66 -66 13
# 6 2016-01-01 20:00 12 -66 13
# 7 2016-01-02 0:00 -19 -53 19
# 8 2016-01-02 4:00 -8 -53 19
# 9 2016-01-02 8:00 8 -53 19
# 10 2016-01-02 12:00 18 -53 19
# 11 2016-01-02 16:00 -53 -53 19
# 12 2016-01-02 20:00 19 -53 19
# 13 2016-01-03 0:00 12 -74 42
# 14 2016-01-03 4:00 27 -74 42
# 15 2016-01-03 8:00 -74 -74 42
# 16 2016-01-03 12:00 -31 -74 42
# 17 2016-01-03 16:00 42 -74 42
# 18 2016-01-03 20:00 -62 -74 42
However, if there are missings in your data, you'll need an anonymus function.
dat[7, 2] <- NA
transform(
dat,
min=ave(values, strftime(timestamp, '%F'), FUN=\(x) min(x, na.rm=TRUE)),
max=ave(values, strftime(timestamp, '%F'), FUN=\(x) max(x, na.rm=TRUE)))
# timestamp values min max
# 1 2016-01-01 0:00 -27 -66 13
# 2 2016-01-01 4:00 -32 -66 13
# 3 2016-01-01 8:00 13 -66 13
# 4 2016-01-01 12:00 -52 -66 13
# 5 2016-01-01 16:00 -66 -66 13
# 6 2016-01-01 20:00 12 -66 13
# 7 2016-01-02 0:00 NA -53 19
# 8 2016-01-02 4:00 -8 -53 19
# 9 2016-01-02 8:00 8 -53 19
# 10 2016-01-02 12:00 18 -53 19
# 11 2016-01-02 16:00 -53 -53 19
# 12 2016-01-02 20:00 19 -53 19
# 13 2016-01-03 0:00 12 -74 42
# 14 2016-01-03 4:00 27 -74 42
# 15 2016-01-03 8:00 -74 -74 42
# 16 2016-01-03 12:00 -31 -74 42
# 17 2016-01-03 16:00 42 -74 42
# 18 2016-01-03 20:00 -62 -74 42
Even a bit prettier would be a real timestamp in POSIXct
format.
r <- transform(dat, timestamp=as.POSIXct(timestamp),
min=ave(values, strftime(timestamp, '%F'), FUN=min),
max=ave(values, strftime(timestamp, '%F'), FUN=max))
r
# timestamp values min max
# 1 2016-01-01 00:00:00 -27 -66 13
# 2 2016-01-01 04:00:00 -32 -66 13
# 3 2016-01-01 08:00:00 13 -66 13
# 4 2016-01-01 12:00:00 -52 -66 13
# 5 2016-01-01 16:00:00 -66 -66 13
# 6 2016-01-01 20:00:00 12 -66 13
# 7 2016-01-02 00:00:00 NA NA NA
# 8 2016-01-02 04:00:00 -8 NA NA
# 9 2016-01-02 08:00:00 8 NA NA
# 10 2016-01-02 12:00:00 18 NA NA
# 11 2016-01-02 16:00:00 -53 NA NA
# 12 2016-01-02 20:00:00 19 NA NA
# 13 2016-01-03 00:00:00 12 -74 42
# 14 2016-01-03 04:00:00 27 -74 42
# 15 2016-01-03 08:00:00 -74 -74 42
# 16 2016-01-03 12:00:00 -31 -74 42
# 17 2016-01-03 16:00:00 42 -74 42
# 18 2016-01-03 20:00:00 -62 -74 42
Data:
dat <- structure(list(timestamp = c("2016-01-01 0:00", "2016-01-01 2:00",
"2016-01-01 4:00", "2016-01-01 6:00", "2016-01-01 8:00", "2016-01-01 10:00",
"2016-01-01 12:00", "2016-01-01 14:00", "2016-01-01 16:00", "2016-01-01 18:00",
"2016-01-01 20:00", "2016-01-01 22:00", "2016-01-02 0:00", "2016-01-02 2:00",
"2016-01-02 4:00", "2016-01-02 6:00", "2016-01-02 8:00", "2016-01-02 10:00",
"2016-01-02 12:00", "2016-01-02 14:00", "2016-01-02 16:00", "2016-01-02 18:00",
"2016-01-02 20:00", "2016-01-02 22:00", "2016-01-03 0:00", "2016-01-03 2:00",
"2016-01-03 4:00", "2016-01-03 6:00", "2016-01-03 8:00", "2016-01-03 10:00",
"2016-01-03 12:00", "2016-01-03 14:00", "2016-01-03 16:00", "2016-01-03 18:00",
"2016-01-03 20:00", "2016-01-03 22:00"), values = c(7, 12, NA,
-70, -4, -22, -13, -76, 13, 45, 48, 55, -64, -30, -20, -8, -10,
40, -32, 3, -67, -66, -74, -75, 57, 16, -31, -17, 9, 7, -66,
13, 41, 58, 26, 58)), class = "data.frame", row.names = c(NA,
-36L))
Upvotes: 1
Reputation: 388982
Convert timestamp
to POSIXct
, extract date from it and for each date keep the row that has max and min value.
library(dplyr)
library(lubridate)
result <- df %>%
mutate(timestamp = mdy_hm(timestamp),
date = as.Date(timestamp)) %>%
arrange(date, VALUES) %>%
group_by(date) %>%
slice(1, n())
Upvotes: 2