Reputation: 1
I have a dataset of temperature values taken at specific datetimes across five locations. For whatever reason, sometimes the readings are every hour, and some every four hours. Another issue is that when the time changed as a result of daylight savings, the readings are off by one hour. I am interested in the readings taken every four hours and would like to subset these by day and night to ultimately get daily and nightly mean temperatures.
To summarise, the readings I am interested in are either:
0800, 1200, 1600 =day
2000, 0000, 0400 =night
Recordings between 0800-1600 and 2000-0400 each day should be averaged.
During daylight savings, the equivalent times are:
0900, 1300, 1700 =day
2100, 0100, 0500 =night
Recordings between 0900-1700 and 2100-0500 each day should be averaged.
In the process, I am hoping to subset by site.
There are also some NA
values or blank cells which should be ignored.
So far, I tried to subset by one hour of interest just to see if it worked, but haven't got any further than that. Any tips on how to subset by a series of times of interest? Thanks!
temperature <- read.csv("SeaTemperatureData.csv",
stringsAsFactors = FALSE)
temperature <- subset(temperature, select=-c(X)) #remove last column that contains comments, not needed
temperature$Date.Time < -as.POSIXct(temperature$Date.Time,
format="%d/%m/%Y %H:%M",
tz="Pacific/Auckland")
#subset data by time, we only want to include temperatures recorded at certain times
temperature.goat <- subset(temperature, Date.Time==c('01:00:00'), select=c("Goat.Island"))
Date.Time Goat.Island Tawharanui Kawau Tiritiri Noises
1 2019-06-10 16:00:00 16.820 16.892 16.749 16.677 15.819
2 2019-06-10 20:00:00 16.773 16.844 16.582 16.654 15.796
3 2019-06-11 00:00:00 16.749 16.820 16.749 16.606 15.819
4 2019-06-11 04:00:00 16.487 16.796 16.654 16.558 15.796
5 2019-06-11 08:00:00 16.582 16.749 16.487 16.463 15.867
6 2019-06-11 12:00:00 16.630 16.773 16.725 16.654 15.867
Upvotes: 0
Views: 1072
Reputation: 16178
One possible solution is to extract hours from your DateTime variable, then filter for particular hours of interest.
Here a fake example over 4 days:
library(lubridate)
df <- data.frame(DateTime = seq(ymd_hms("2020-02-01 00:00:00"), ymd_hms("2020-02-05 00:00:00"), by = "hour"),
Value = sample(1:100,97, replace = TRUE))
DateTime Value
1 2020-02-01 00:00:00 99
2 2020-02-01 01:00:00 51
3 2020-02-01 02:00:00 44
4 2020-02-01 03:00:00 49
5 2020-02-01 04:00:00 60
6 2020-02-01 05:00:00 56
Now, you can extract hours with hour
function of lubridate
and subset for the desired hour:
library(lubridate)
subset(df, hour(DateTime) == 5)
DateTime Value
6 2020-02-01 05:00:00 56
30 2020-02-02 05:00:00 31
54 2020-02-03 05:00:00 65
78 2020-02-04 05:00:00 80
EDIT: Getting mean of each sites per subset of hours
Per OP's request in comments, the question is to calcualte the mean of values for various sites for different period of times.
Basically, you want to have two period per days, one from 8:00 to 17:00 and the other one from 18:00 to 7:00.
Here, a more elaborated example based on the previous one:
df <- data.frame(DateTime = seq(ymd_hms("2020-02-01 00:00:00"), ymd_hms("2020-02-05 00:00:00"), by = "hour"),
Site1 = sample(1:100,97, replace = TRUE),
Site2 = sample(1:100,97, replace = TRUE))
DateTime Site1 Site2
1 2020-02-01 00:00:00 100 6
2 2020-02-01 01:00:00 9 49
3 2020-02-01 02:00:00 86 12
4 2020-02-01 03:00:00 34 55
5 2020-02-01 04:00:00 76 29
6 2020-02-01 05:00:00 41 1
....
So, now you can do the following to label each time point as daily or night, then group by this category for each day and calculate the mean of each individual sites using summarise_at
:
library(lubridate)
library(dplyr)
df %>% mutate(Date = date(DateTime),
Hour= hour(DateTime),
Category = ifelse(between(hour(DateTime),8,17),"Daily","Night")) %>%
group_by(Date, Category) %>%
summarise_at(vars(c(Site1,Site2)), ~ mean(., na.rm = TRUE))
# A tibble: 9 x 4
# Groups: Date [5]
Date Category Site1 Site2
<date> <chr> <dbl> <dbl>
1 2020-02-01 Daily 56.9 63.1
2 2020-02-01 Night 58.9 46.6
3 2020-02-02 Daily 54.5 47.6
4 2020-02-02 Night 36.9 41.7
5 2020-02-03 Daily 42.3 56.9
6 2020-02-03 Night 44.1 55.9
7 2020-02-04 Daily 54.3 50.4
8 2020-02-04 Night 54.8 34.3
9 2020-02-05 Night 75 16
Does it answer your question ?
Upvotes: 0