Reputation: 61
I have a dataframe with temperature measurements every 10 minutes. The measurements were taken at different locations (named as 'LCZ'), with for each location the values in a different column.
This is part of my dataframe: (it also contains missing values NA)
Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2017-08-26 17:00:00 27.5 27.5 27.5 27.0 27.0
2 2017-08-26 17:10:00 27.5 27.0 27.5 27.0 27.0
3 2017-08-26 17:20:00 27.5 27.0 27.0 27.0 27.0
4 2017-08-26 17:30:00 27.0 26.5 27.0 26.5 26.5
5 2017-08-26 17:40:00 26.5 26.5 26.5 26.5 26.5
6 2017-08-26 17:50:00 26.5 26.0 26.5 26.0 26.5
7 2017-08-26 18:00:00 26.5 26.0 26.5 26.5 26.5
8 2017-08-26 18:10:00 27.0 26.0 26.5 26.5 26.0
9 2017-08-26 18:20:00 26.5 26.5 26.5 26.5 26.0
10 2017-08-26 18:30:00 26.5 26.5 26.5 26.5 26.0
I want for each location or column to calculate the hourly min/max/median temperature and in addition for the hourly min/max also the timestamp from the original data at which the min/max respectively occured.
Is this possible with R?
I tried already various functions.
group_by
allowed me to calculate min/max for each column but without timestamps. period.apply
also allowed me to calculate min/max/median but only for one column.
Also aggregate()
didn't lead to any succes.
I'm learning in R, but didn't came close to a solution on this problem.
This website has helped me with various problems, but I'm really stuck on this one. Can someone help? Thanks in advance
Upvotes: 5
Views: 1702
Reputation: 23889
Here is a way of doing it with dplyr
verbs:
library(lubridate)
df %>%
gather(Location, Temp, -Time) %>%
group_by(Date = date(Time), HoD = hour(Time), Location) %>%
mutate_at(.vars = "Temp", .funs = list(Min = min, Max = max, Median = median)) %>%
filter(Temp == Min | Temp == Max) %>%
arrange(Location, Time) %>%
distinct(Temp, .keep_all = T) %>%
mutate(MinMax = ifelse(Temp == Min, "MinTime", "MaxTime")) %>%
dplyr::select(-Temp) %>%
spread("MinMax", "Time")
Output:
Notice the NA
which mean that minimum and maximum temperatures were identical on that day, in that hour and at that location.
# A tibble: 10 x 8
# Groups: Date, HoD, Location [10]
Location Date HoD Min Max Median MaxTime MinTime
<chr> <date> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 LCZ.3.10 2017-08-26 17 26.0 27.5 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
2 LCZ.3.10 2017-08-26 18 26.0 26.5 26.2 2017-08-26 18:20:00 2017-08-26 18:00:00
3 LCZ.3.2 2017-08-26 17 26.5 27.5 27.2 2017-08-26 17:00:00 2017-08-26 17:40:00
4 LCZ.3.2 2017-08-26 18 26.5 27.0 26.5 2017-08-26 18:10:00 2017-08-26 18:00:00
5 LCZ.6.1 2017-08-26 17 26.5 27.5 27.0 2017-08-26 17:00:00 2017-08-26 17:40:00
6 LCZ.6.1 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
7 LCZ.6.9 2017-08-26 17 26.0 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
8 LCZ.6.9 2017-08-26 18 26.5 26.5 26.5 NA 2017-08-26 18:00:00
9 LCZ.9.4 2017-08-26 17 26.5 27.0 26.8 2017-08-26 17:00:00 2017-08-26 17:30:00
10 LCZ.9.4 2017-08-26 18 26.0 26.5 26.0 2017-08-26 18:00:00 2017-08-26 18:10:00
Upvotes: 3
Reputation: 20095
Not very sure in which format OP
wants results to be presented. One solution could be found using mutate_at
as:
library(lubridate)
library(dplyr)
result <- df %>% mutate(Time = ymd_hms(Time)) %>%
group_by(Hourly = format(Time, "%Y%m%d%H")) %>%
mutate_at(vars(starts_with("LCZ")), funs(min = min, max = max, med = median )) %>%
select(Time, Hourly, sort(names(select(.,-Time-Hourly))))
result[,1:9]
# # A tibble: 10 x 9
# # Groups: Hourly [2]
# Time Hourly LCZ3_02 LCZ3_02_max LCZ3_02_med LCZ3_10 LCZ3_10_max LCZ3_10_med LCZ3_10_min
# <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2017-08-26 17:00:00 2017082617 27.5 27.5 27.2 27.5 27.5 26.8 26.0
# 2 2017-08-26 17:10:00 2017082617 27.5 27.5 27.2 27.0 27.5 26.8 26.0
# 3 2017-08-26 17:20:00 2017082617 27.5 27.5 27.2 27.0 27.5 26.8 26.0
# 4 2017-08-26 17:30:00 2017082617 27.0 27.5 27.2 26.5 27.5 26.8 26.0
# 5 2017-08-26 17:40:00 2017082617 26.5 27.5 27.2 26.5 27.5 26.8 26.0
# 6 2017-08-26 17:50:00 2017082617 26.5 27.5 27.2 26.0 27.5 26.8 26.0
# 7 2017-08-26 18:00:00 2017082618 26.5 27.0 26.5 26.0 26.5 26.2 26.0
# 8 2017-08-26 18:10:00 2017082618 27.0 27.0 26.5 26.0 26.5 26.2 26.0
# 9 2017-08-26 18:20:00 2017082618 26.5 27.0 26.5 26.5 26.5 26.2 26.0
# 10 2017-08-26 18:30:00 2017082618 26.5 27.0 26.5 26.5 26.5 26.2 26.0
Data
df <- read.table(text =
"Time LCZ3_02 LCZ3_10 LCZ6_01 LCZ6_09 LCZ9_04
1 '2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
2 '2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
3 '2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
4 '2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
5 '2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
6 '2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
7 '2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
8 '2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
9 '2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
10 '2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2
Reputation: 50678
Here is a tidyverse
solution.
Explanation: We create a new hour-floor
ed time column Time.hour
, by which we can group; we then calculate the necessary summary statistics.
res <- df %>%
mutate(Time = as.POSIXct(Time, format = "%Y-%m-%d %H:%M:%S")) %>% # Time as POSIXct
gather(location, value, -Time) %>%
mutate(Time.hour = format(Time, "%y-%m-%d %H")) %>%
group_by(Time.hour, location) %>%
summarise(min = min(value), max = max(value), median = median(value));
res;
## A tibble: 10 x 5
## Groups: Time.hour [?]
# Time.hour location min max median
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 17-08-26 17 LCZ.3.10 26.0 27.5 26.8
# 2 17-08-26 17 LCZ.3.2 26.5 27.5 27.2
# 3 17-08-26 17 LCZ.6.1 26.5 27.5 27.0
# 4 17-08-26 17 LCZ.6.9 26.0 27.0 26.8
# 5 17-08-26 17 LCZ.9.4 26.5 27.0 26.8
# 6 17-08-26 18 LCZ.3.10 26.0 26.5 26.2
# 7 17-08-26 18 LCZ.3.2 26.5 27.0 26.5
# 8 17-08-26 18 LCZ.6.1 26.5 26.5 26.5
# 9 17-08-26 18 LCZ.6.9 26.5 26.5 26.5
#10 17-08-26 18 LCZ.9.4 26.0 26.5 26.0
If need be, convert to wide:
res %>%
ungroup() %>%
gather(what, val, min:median) %>%
unite(key, what, location) %>%
spread(key, val)
## A tibble: 2 x 16
# Time.hour max_LCZ.3.10 max_LCZ.3.2 max_LCZ.6.1 max_LCZ.6.9 max_LCZ.9.4
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 17-08-26 17 27.5 27.5 27.5 27.0 27.0
#2 17-08-26 18 26.5 27.0 26.5 26.5 26.5
## ... with 10 more variables: median_LCZ.3.10 <dbl>, median_LCZ.3.2 <dbl>,
## median_LCZ.6.1 <dbl>, median_LCZ.6.9 <dbl>, median_LCZ.9.4 <dbl>,
## min_LCZ.3.10 <dbl>, min_LCZ.3.2 <dbl>, min_LCZ.6.1 <dbl>,
## min_LCZ.6.9 <dbl>, min_LCZ.9.4 <dbl>
df <- read.table(text =
"Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
1 '2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
2 '2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
3 '2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
4 '2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
5 '2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
6 '2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
7 '2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
8 '2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
9 '2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
10 '2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0", header = T, row.names = 1)
Upvotes: 2
Reputation: 39154
We can use floor_date
from the lubridate package to create a new column Time2
to show the hourly information. If this is not the way you want to define the hourly grouping, you can also try round_date
or ceiling_date
. After that, we can use gather
from the tidyr package to convert the data frame from wide format to long format.
library(dplyr)
library(tidyr)
library(lubridate)
dat2 <- dat %>%
mutate(Time = ymd_hms(Time),
Time2 = floor_date(Time, unit = "hour")) %>%
gather(LCZ, Value, starts_with("LCZ")) %>%
group_by(Time2, LCZ)
After that, we can summarize the data by LCZ
and Time2
.
dat3 <- dat2 %>%
summarise(Min = min(Value, na.rm = TRUE),
Max = max(Value, na.rm = TRUE),
Median = median(Value, na.rm = TRUE)) %>%
ungroup()
dat3
# # A tibble: 10 x 5
# Time2 LCZ Min Max Median
# <dttm> <chr> <dbl> <dbl> <dbl>
# 1 2017-08-26 17:00:00 LCZ.3.10 26.0 27.5 26.8
# 2 2017-08-26 17:00:00 LCZ.3.2 26.5 27.5 27.2
# 3 2017-08-26 17:00:00 LCZ.6.1 26.5 27.5 27.0
# 4 2017-08-26 17:00:00 LCZ.6.9 26.0 27.0 26.8
# 5 2017-08-26 17:00:00 LCZ.9.4 26.5 27.0 26.8
# 6 2017-08-26 18:00:00 LCZ.3.10 26.0 26.5 26.2
# 7 2017-08-26 18:00:00 LCZ.3.2 26.5 27.0 26.5
# 8 2017-08-26 18:00:00 LCZ.6.1 26.5 26.5 26.5
# 9 2017-08-26 18:00:00 LCZ.6.9 26.5 26.5 26.5
# 10 2017-08-26 18:00:00 LCZ.9.4 26.0 26.5 26.0
If you want, we can create binary value to indicate if the value is minimum, maximum, or median as follows. This format is useful when you further want to filter the data frame.
dat4 <- dat2 %>%
mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,
Max = (Value == max(Value, na.rm = TRUE)) + 0L,
Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%
ungroup()
dat4
# # A tibble: 50 x 7
# Time Time2 LCZ Value Min Max Median
# <dttm> <dttm> <chr> <dbl> <int> <int> <int>
# 1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2 27.5 0 1 0
# 4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2 27.0 0 0 0
# 5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2 26.5 1 0 0
# 7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2 27.0 0 1 0
# 9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2 26.5 1 0 1
# # ... with 40 more rows
DATA
dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
'2017-08-26 17:00:00' 27.5 27.5 27.5 27.0 27.0
'2017-08-26 17:10:00' 27.5 27.0 27.5 27.0 27.0
'2017-08-26 17:20:00' 27.5 27.0 27.0 27.0 27.0
'2017-08-26 17:30:00' 27.0 26.5 27.0 26.5 26.5
'2017-08-26 17:40:00' 26.5 26.5 26.5 26.5 26.5
'2017-08-26 17:50:00' 26.5 26.0 26.5 26.0 26.5
'2017-08-26 18:00:00' 26.5 26.0 26.5 26.5 26.5
'2017-08-26 18:10:00' 27.0 26.0 26.5 26.5 26.0
'2017-08-26 18:20:00' 26.5 26.5 26.5 26.5 26.0
'2017-08-26 18:30:00' 26.5 26.5 26.5 26.5 26.0",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 6