How to subset time data (factor) into hourly intervals

Question

now I have a data.frame with dim(1:1080) with variables date, time and glob.rad.

      date      time         glob.rad
1   2014/07/19  00:00:00     -1.6
2   2014/07/19  00:02:00     -1.6
3   2014/07/19  00:03:00     -1.6
4   2014/07/19  00:04:00     -1.6
5   2014/07/19  00:06:00     -1.6
6   2014/07/19  00:07:00     -1.6
7   2014/07/19  00:08:00     -1.6
8   2014/07/19  00:10:00     -1.6
9   2014/07/19  00:11:00     -1.6
10  2014/07/19  00:12:00     -1.6
11  2014/07/19  00:14:00     -1.6
12  2014/07/19  00:15:00     -1.6
13  2014/07/19  00:16:00     -1.6
14  2014/07/19  00:18:00     -1.5
15  2014/07/19  00:19:00     -1.5
16  2014/07/19  00:20:00     -1.4
17  2014/07/19  00:22:00     -1.4
18  2014/07/19  00:23:00     -1.3
19  2014/07/19  00:24:00     -1.3
20  2014/07/19  00:26:00     -1.3
21  2014/07/19  00:27:00     -1.3
22  2014/07/19  00:28:00     -1.3
23  2014/07/19  00:30:00     -1.3
24  2014/07/19  00:31:00     -1.4
25  2014/07/19  00:32:00     -1.4
26  2014/07/19  00:34:00     -1.5
27  2014/07/19  00:35:00     -1.5
28  2014/07/19  00:36:00     -1.6
29  2014/07/19  00:38:00     -1.6
30  2014/07/19  00:39:00     -1.6
31  2014/07/19  00:40:00     -1.6
32  2014/07/19  00:42:00     -1.6
33  2014/07/19  00:43:00     -1.6
34  2014/07/19  00:44:00     -1.6
35  2014/07/19  00:46:00     -1.6
36  2014/07/19  00:47:00     -1.6
37  2014/07/19  00:48:00     -1.6
38  2014/07/19  00:50:00     -1.6
39  2014/07/19  00:51:00     -1.6
40  2014/07/19  00:52:00     -1.6
41  2014/07/19  00:54:00     -1.6
42  2014/07/19  00:55:00     -1.6
43  2014/07/19  00:56:00     -1.6
44  2014/07/19  00:58:00     -1.6
45  2014/07/19  00:59:00     -1.6
46  2014/07/19  01:00:00     -1.6
47  2014/07/19  01:02:00     -1.6
48  2014/07/19  01:03:00     -1.6
49  2014/07/19  01:04:00     -1.6
50  2014/07/19  01:06:00     -1.6
...

All variables are factors. The aim is to subset the variable "time" into hourly intervals in order to calculate the mean of "glob.rad" within one hour.

    date        time         glob.rad
1   2014/07/19  00:00:00     -1.6
2   2014/07/19  01:00:00     -1.6
3   2014/07/19  02:00:00     -1.6
...

Though I know how to deal with POSIXct data as a date-time, but do not know how to deal with time as factor. Until now I've tried cut() and subset() and as.numeric(), but it doesn't work.

Paulo E. Cardoso · Accepted Answer

I like the semantics of dplyr with pipes (%>%). It is pretty much like reading a sentence.

tab <- structure(list(date = c("2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19"), time = c("00:00:00", "00:02:00", 
"00:03:00", "00:04:00", "00:06:00", "00:07:00", "00:08:00", "00:10:00", 
"00:11:00", "00:12:00", "00:14:00", "00:15:00", "00:16:00", "00:18:00", 
"00:19:00", "00:20:00", "00:22:00", "00:23:00", "00:24:00", "00:26:00", 
"00:27:00", "00:28:00", "00:30:00", "00:31:00", "00:32:00", "00:34:00", 
"00:35:00", "00:36:00", "00:38:00", "00:39:00", "00:40:00", "00:42:00", 
"00:43:00", "00:44:00", "00:46:00", "00:47:00", "00:48:00", "00:50:00", 
"00:51:00", "00:52:00", "00:54:00", "00:55:00", "00:56:00", "00:58:00", 
"00:59:00", "01:00:00", "01:02:00", "01:03:00", "01:04:00", "01:06:00"
), glob.rad = c(-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, 
-1.6, -1.6, -1.6, -1.6, -1.6, -1.5, -1.5, -1.4, -1.4, -1.3, -1.3, 
-1.3, -1.3, -1.3, -1.3, -1.4, -1.4, -1.5, -1.5, -1.6, -1.6, -1.6, 
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, 
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6)), .Names = c("date", 
"time", "glob.rad"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50"))


#> head(tab)
#        date     time glob.rad
#1 2014/07/19 00:00:00     -1.6
#2 2014/07/19 00:02:00     -1.6
#3 2014/07/19 00:03:00     -1.6
#4 2014/07/19 00:04:00     -1.6
#5 2014/07/19 00:06:00     -1.6
#6 2014/07/19 00:07:00     -1.6

library(lubridate)
library(dplyr)

tab$date <- ymd_hms(paste(tab$date, tab$time))
tab$hour <- hour(tab$date)
#head(tab)
tab%>%
  group_by(hour)%>%
  summarise(avg=mean(glob.rad, na.rm=T))

#Source: local data frame [2 x 2]
#
#  hour       avg
#1    0 -1.533333
#2    1 -1.600000

If you want to summarise glob.rad by day-and-hour, and for simplicity, you could create a new variable extracting day from your date column.

tab$day <- day(tab$date)

and add it to your grouping

tab%>%
  group_by(day, hour)%>%
  summarise(avg=mean(glob.rad, na.rm=T))

Source: local data frame [2 x 3]
Groups: day

  day hour       avg
1  19    0 -1.533333
2  19    1 -1.600000

sessionInfo()
#R version 3.2.2 (2015-08-14)
#...
#other attached packages:
#[1] lubridate_1.3.3 dplyr_0.4.2

How to subset time data (factor) into hourly intervals

Answers (2)

Related Questions