chm
chm

Reputation: 41

How to subset time data (factor) into hourly intervals

now I have a data.frame with dim(1:1080) with variables date, time and glob.rad.

      date      time         glob.rad
1   2014/07/19  00:00:00     -1.6
2   2014/07/19  00:02:00     -1.6
3   2014/07/19  00:03:00     -1.6
4   2014/07/19  00:04:00     -1.6
5   2014/07/19  00:06:00     -1.6
6   2014/07/19  00:07:00     -1.6
7   2014/07/19  00:08:00     -1.6
8   2014/07/19  00:10:00     -1.6
9   2014/07/19  00:11:00     -1.6
10  2014/07/19  00:12:00     -1.6
11  2014/07/19  00:14:00     -1.6
12  2014/07/19  00:15:00     -1.6
13  2014/07/19  00:16:00     -1.6
14  2014/07/19  00:18:00     -1.5
15  2014/07/19  00:19:00     -1.5
16  2014/07/19  00:20:00     -1.4
17  2014/07/19  00:22:00     -1.4
18  2014/07/19  00:23:00     -1.3
19  2014/07/19  00:24:00     -1.3
20  2014/07/19  00:26:00     -1.3
21  2014/07/19  00:27:00     -1.3
22  2014/07/19  00:28:00     -1.3
23  2014/07/19  00:30:00     -1.3
24  2014/07/19  00:31:00     -1.4
25  2014/07/19  00:32:00     -1.4
26  2014/07/19  00:34:00     -1.5
27  2014/07/19  00:35:00     -1.5
28  2014/07/19  00:36:00     -1.6
29  2014/07/19  00:38:00     -1.6
30  2014/07/19  00:39:00     -1.6
31  2014/07/19  00:40:00     -1.6
32  2014/07/19  00:42:00     -1.6
33  2014/07/19  00:43:00     -1.6
34  2014/07/19  00:44:00     -1.6
35  2014/07/19  00:46:00     -1.6
36  2014/07/19  00:47:00     -1.6
37  2014/07/19  00:48:00     -1.6
38  2014/07/19  00:50:00     -1.6
39  2014/07/19  00:51:00     -1.6
40  2014/07/19  00:52:00     -1.6
41  2014/07/19  00:54:00     -1.6
42  2014/07/19  00:55:00     -1.6
43  2014/07/19  00:56:00     -1.6
44  2014/07/19  00:58:00     -1.6
45  2014/07/19  00:59:00     -1.6
46  2014/07/19  01:00:00     -1.6
47  2014/07/19  01:02:00     -1.6
48  2014/07/19  01:03:00     -1.6
49  2014/07/19  01:04:00     -1.6
50  2014/07/19  01:06:00     -1.6
... 

All variables are factors. The aim is to subset the variable "time" into hourly intervals in order to calculate the mean of "glob.rad" within one hour.

    date        time         glob.rad
1   2014/07/19  00:00:00     -1.6
2   2014/07/19  01:00:00     -1.6
3   2014/07/19  02:00:00     -1.6
...

Though I know how to deal with POSIXct data as a date-time, but do not know how to deal with time as factor. Until now I've tried cut() and subset() and as.numeric(), but it doesn't work.

Upvotes: 0

Views: 401

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99331

You don't need to deal with the time as a factor. You can do that, but pasting the date and time columns together to use for the grouping will make life less stressful. The data.table package makes this pretty easy, as it has functions for extracting the parts of a POSIX/Date object. We can use those parts for our groupings.

library(data.table)
setDT(df)[, .(mean = mean(glob.rad)), by = hour(paste(date, time))]
#    hour      mean
# 1:    0 -1.533333
# 2:    1 -1.600000

The original data remains unchanged, other than it being converted to a data table. If you wanted the date and hour in the result, you can do

df[, .(mean = mean(glob.rad)), by = .(date, hour(paste(date, time)))]
#          date hour      mean
# 1: 2014/07/19    0 -1.533333
# 2: 2014/07/19    1 -1.600000

This last block does actually use a factor in the date column, as I didn't find it necessary to change that to a date classed column.

Upvotes: 2

Paulo E. Cardoso
Paulo E. Cardoso

Reputation: 5856

I like the semantics of dplyr with pipes (%>%). It is pretty much like reading a sentence.

tab <- structure(list(date = c("2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", 
"2014/07/19", "2014/07/19"), time = c("00:00:00", "00:02:00", 
"00:03:00", "00:04:00", "00:06:00", "00:07:00", "00:08:00", "00:10:00", 
"00:11:00", "00:12:00", "00:14:00", "00:15:00", "00:16:00", "00:18:00", 
"00:19:00", "00:20:00", "00:22:00", "00:23:00", "00:24:00", "00:26:00", 
"00:27:00", "00:28:00", "00:30:00", "00:31:00", "00:32:00", "00:34:00", 
"00:35:00", "00:36:00", "00:38:00", "00:39:00", "00:40:00", "00:42:00", 
"00:43:00", "00:44:00", "00:46:00", "00:47:00", "00:48:00", "00:50:00", 
"00:51:00", "00:52:00", "00:54:00", "00:55:00", "00:56:00", "00:58:00", 
"00:59:00", "01:00:00", "01:02:00", "01:03:00", "01:04:00", "01:06:00"
), glob.rad = c(-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, 
-1.6, -1.6, -1.6, -1.6, -1.6, -1.5, -1.5, -1.4, -1.4, -1.3, -1.3, 
-1.3, -1.3, -1.3, -1.3, -1.4, -1.4, -1.5, -1.5, -1.6, -1.6, -1.6, 
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, 
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6)), .Names = c("date", 
"time", "glob.rad"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50"))


#> head(tab)
#        date     time glob.rad
#1 2014/07/19 00:00:00     -1.6
#2 2014/07/19 00:02:00     -1.6
#3 2014/07/19 00:03:00     -1.6
#4 2014/07/19 00:04:00     -1.6
#5 2014/07/19 00:06:00     -1.6
#6 2014/07/19 00:07:00     -1.6

library(lubridate)
library(dplyr)

tab$date <- ymd_hms(paste(tab$date, tab$time))
tab$hour <- hour(tab$date)
#head(tab)
tab%>%
  group_by(hour)%>%
  summarise(avg=mean(glob.rad, na.rm=T))

#Source: local data frame [2 x 2]
#
#  hour       avg
#1    0 -1.533333
#2    1 -1.600000

If you want to summarise glob.rad by day-and-hour, and for simplicity, you could create a new variable extracting day from your date column.

tab$day <- day(tab$date)

and add it to your grouping

tab%>%
  group_by(day, hour)%>%
  summarise(avg=mean(glob.rad, na.rm=T))

Source: local data frame [2 x 3]
Groups: day

  day hour       avg
1  19    0 -1.533333
2  19    1 -1.600000

sessionInfo()
#R version 3.2.2 (2015-08-14)
#...
#other attached packages:
#[1] lubridate_1.3.3 dplyr_0.4.2

Upvotes: 1

Related Questions