Louise
Louise

Reputation: 1

How to subset data by specific hours of interest?

I have a dataset of temperature values taken at specific datetimes across five locations. For whatever reason, sometimes the readings are every hour, and some every four hours. Another issue is that when the time changed as a result of daylight savings, the readings are off by one hour. I am interested in the readings taken every four hours and would like to subset these by day and night to ultimately get daily and nightly mean temperatures.

To summarise, the readings I am interested in are either:

0800, 1200, 1600 =day
2000, 0000, 0400 =night

Recordings between 0800-1600 and 2000-0400 each day should be averaged.

During daylight savings, the equivalent times are:

0900, 1300, 1700 =day
2100, 0100, 0500 =night

Recordings between 0900-1700 and 2100-0500 each day should be averaged.

In the process, I am hoping to subset by site. There are also some NA values or blank cells which should be ignored.

So far, I tried to subset by one hour of interest just to see if it worked, but haven't got any further than that. Any tips on how to subset by a series of times of interest? Thanks!

temperature <- read.csv("SeaTemperatureData.csv", 
                      stringsAsFactors = FALSE) 
temperature <- subset(temperature, select=-c(X)) #remove last column that contains comments, not needed
temperature$Date.Time < -as.POSIXct(temperature$Date.Time,
                                    format="%d/%m/%Y %H:%M",
                                    tz="Pacific/Auckland")
#subset data by time, we only want to include temperatures recorded at certain times
temperature.goat <- subset(temperature, Date.Time==c('01:00:00'), select=c("Goat.Island"))
            Date.Time Goat.Island Tawharanui  Kawau Tiritiri Noises
1 2019-06-10 16:00:00      16.820     16.892 16.749   16.677 15.819
2 2019-06-10 20:00:00      16.773     16.844 16.582   16.654 15.796
3 2019-06-11 00:00:00      16.749     16.820 16.749   16.606 15.819
4 2019-06-11 04:00:00      16.487     16.796 16.654   16.558 15.796
5 2019-06-11 08:00:00      16.582     16.749 16.487   16.463 15.867
6 2019-06-11 12:00:00      16.630     16.773 16.725   16.654 15.867

Upvotes: 0

Views: 1072

Answers (1)

dc37
dc37

Reputation: 16178

One possible solution is to extract hours from your DateTime variable, then filter for particular hours of interest.

Here a fake example over 4 days:

library(lubridate)

df <- data.frame(DateTime = seq(ymd_hms("2020-02-01 00:00:00"), ymd_hms("2020-02-05 00:00:00"), by = "hour"),
                 Value = sample(1:100,97, replace = TRUE))

             DateTime Value
1 2020-02-01 00:00:00    99
2 2020-02-01 01:00:00    51
3 2020-02-01 02:00:00    44
4 2020-02-01 03:00:00    49
5 2020-02-01 04:00:00    60
6 2020-02-01 05:00:00    56

Now, you can extract hours with hour function of lubridate and subset for the desired hour:

library(lubridate)

subset(df, hour(DateTime) == 5)

              DateTime Value
6  2020-02-01 05:00:00    56
30 2020-02-02 05:00:00    31
54 2020-02-03 05:00:00    65
78 2020-02-04 05:00:00    80

EDIT: Getting mean of each sites per subset of hours

Per OP's request in comments, the question is to calcualte the mean of values for various sites for different period of times.

Basically, you want to have two period per days, one from 8:00 to 17:00 and the other one from 18:00 to 7:00.

Here, a more elaborated example based on the previous one:

df <- data.frame(DateTime = seq(ymd_hms("2020-02-01 00:00:00"), ymd_hms("2020-02-05 00:00:00"), by = "hour"),
                 Site1 = sample(1:100,97, replace = TRUE),
                 Site2 = sample(1:100,97, replace = TRUE))

             DateTime Site1 Site2
1 2020-02-01 00:00:00   100     6
2 2020-02-01 01:00:00     9    49
3 2020-02-01 02:00:00    86    12
4 2020-02-01 03:00:00    34    55
5 2020-02-01 04:00:00    76    29
6 2020-02-01 05:00:00    41     1
....

So, now you can do the following to label each time point as daily or night, then group by this category for each day and calculate the mean of each individual sites using summarise_at:

library(lubridate)
library(dplyr)

df %>% mutate(Date = date(DateTime),
              Hour= hour(DateTime),
              Category = ifelse(between(hour(DateTime),8,17),"Daily","Night")) %>%
  group_by(Date, Category) %>%
  summarise_at(vars(c(Site1,Site2)), ~ mean(., na.rm = TRUE))

# A tibble: 9 x 4
# Groups:   Date [5]
  Date       Category Site1 Site2
  <date>     <chr>    <dbl> <dbl>
1 2020-02-01 Daily     56.9  63.1
2 2020-02-01 Night     58.9  46.6
3 2020-02-02 Daily     54.5  47.6
4 2020-02-02 Night     36.9  41.7
5 2020-02-03 Daily     42.3  56.9
6 2020-02-03 Night     44.1  55.9
7 2020-02-04 Daily     54.3  50.4
8 2020-02-04 Night     54.8  34.3
9 2020-02-05 Night     75    16  

Does it answer your question ?

Upvotes: 0

Related Questions