Reputation: 1133
I would like to calculate the hourly max of several variables in a data frame, where my data look like this:
Date Gas Diesel
2018-01-01 00:00:00 1.5 2.5
2018-01-01 00:01:00 1.7 1.9
2018-01-01 00:02:00 2.2 1.7
---
---
2018-01-01 01:00:00 2.0 1.0
2018-01-01 01:02:00 2.0 1.0
And my expected output is:
Date max.Gas max.Diesel
2018-01-01 00:00:00 2.2 2.5
2018-01-01 01:00:00 2.0 1.0
Any thoughts on how I can achieve this in R, preferably in a tidyverse style?
Upvotes: 0
Views: 96
Reputation: 6073
A dplyr/tidyverse approach:
# load tidyverse packages
library(dplyr)
library(lubridate)
# create example data
mydf <- tibble(
date = ymd_hms(c("2018-01-01 00:00:00", "2018-01-01 00:01:00", "2018-01-01 00:02:00", "2018-01-01 01:00:00", "2018-01-01 01:01:00")),
gas = c(1.5, 1.7, 2.2, 2.0, 2.0),
diesel = c(2.5, 1.9, 1.7, 1.0, 1.0)
)
# create a new variable "datehour" and get max(gas) by datehour
mydf %>%
mutate(datehour = floor_date(date, unit="hour")) %>%
group_by(datehour) %>%
summarize(maxgas = max(gas), maxdiesel = max(diesel))
To do this for many columns without typing, e.g., maxgas = max(gas)
for each column, use summarize_all
:
mydf %>%
mutate(datehour = floor_date(date, unit="hour")) %>%
select(-date) %>%
group_by(datehour) %>%
summarize_all(max)
Upvotes: 2