Reputation: 41
now I have a data.frame with dim(1:1080) with variables date, time and glob.rad.
date time glob.rad
1 2014/07/19 00:00:00 -1.6
2 2014/07/19 00:02:00 -1.6
3 2014/07/19 00:03:00 -1.6
4 2014/07/19 00:04:00 -1.6
5 2014/07/19 00:06:00 -1.6
6 2014/07/19 00:07:00 -1.6
7 2014/07/19 00:08:00 -1.6
8 2014/07/19 00:10:00 -1.6
9 2014/07/19 00:11:00 -1.6
10 2014/07/19 00:12:00 -1.6
11 2014/07/19 00:14:00 -1.6
12 2014/07/19 00:15:00 -1.6
13 2014/07/19 00:16:00 -1.6
14 2014/07/19 00:18:00 -1.5
15 2014/07/19 00:19:00 -1.5
16 2014/07/19 00:20:00 -1.4
17 2014/07/19 00:22:00 -1.4
18 2014/07/19 00:23:00 -1.3
19 2014/07/19 00:24:00 -1.3
20 2014/07/19 00:26:00 -1.3
21 2014/07/19 00:27:00 -1.3
22 2014/07/19 00:28:00 -1.3
23 2014/07/19 00:30:00 -1.3
24 2014/07/19 00:31:00 -1.4
25 2014/07/19 00:32:00 -1.4
26 2014/07/19 00:34:00 -1.5
27 2014/07/19 00:35:00 -1.5
28 2014/07/19 00:36:00 -1.6
29 2014/07/19 00:38:00 -1.6
30 2014/07/19 00:39:00 -1.6
31 2014/07/19 00:40:00 -1.6
32 2014/07/19 00:42:00 -1.6
33 2014/07/19 00:43:00 -1.6
34 2014/07/19 00:44:00 -1.6
35 2014/07/19 00:46:00 -1.6
36 2014/07/19 00:47:00 -1.6
37 2014/07/19 00:48:00 -1.6
38 2014/07/19 00:50:00 -1.6
39 2014/07/19 00:51:00 -1.6
40 2014/07/19 00:52:00 -1.6
41 2014/07/19 00:54:00 -1.6
42 2014/07/19 00:55:00 -1.6
43 2014/07/19 00:56:00 -1.6
44 2014/07/19 00:58:00 -1.6
45 2014/07/19 00:59:00 -1.6
46 2014/07/19 01:00:00 -1.6
47 2014/07/19 01:02:00 -1.6
48 2014/07/19 01:03:00 -1.6
49 2014/07/19 01:04:00 -1.6
50 2014/07/19 01:06:00 -1.6
...
All variables are factors. The aim is to subset the variable "time" into hourly intervals in order to calculate the mean of "glob.rad" within one hour.
date time glob.rad
1 2014/07/19 00:00:00 -1.6
2 2014/07/19 01:00:00 -1.6
3 2014/07/19 02:00:00 -1.6
...
Though I know how to deal with POSIXct data as a date-time, but do not know how to deal with time as factor.
Until now I've tried cut()
and subset()
and as.numeric()
, but it doesn't work.
Upvotes: 0
Views: 401
Reputation: 99331
You don't need to deal with the time as a factor. You can do that, but pasting the date and time columns together to use for the grouping will make life less stressful. The data.table package makes this pretty easy, as it has functions for extracting the parts of a POSIX/Date object. We can use those parts for our groupings.
library(data.table)
setDT(df)[, .(mean = mean(glob.rad)), by = hour(paste(date, time))]
# hour mean
# 1: 0 -1.533333
# 2: 1 -1.600000
The original data remains unchanged, other than it being converted to a data table. If you wanted the date and hour in the result, you can do
df[, .(mean = mean(glob.rad)), by = .(date, hour(paste(date, time)))]
# date hour mean
# 1: 2014/07/19 0 -1.533333
# 2: 2014/07/19 1 -1.600000
This last block does actually use a factor in the date column, as I didn't find it necessary to change that to a date classed column.
Upvotes: 2
Reputation: 5856
I like the semantics of dplyr with pipes (%>%). It is pretty much like reading a sentence.
tab <- structure(list(date = c("2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19", "2014/07/19",
"2014/07/19", "2014/07/19"), time = c("00:00:00", "00:02:00",
"00:03:00", "00:04:00", "00:06:00", "00:07:00", "00:08:00", "00:10:00",
"00:11:00", "00:12:00", "00:14:00", "00:15:00", "00:16:00", "00:18:00",
"00:19:00", "00:20:00", "00:22:00", "00:23:00", "00:24:00", "00:26:00",
"00:27:00", "00:28:00", "00:30:00", "00:31:00", "00:32:00", "00:34:00",
"00:35:00", "00:36:00", "00:38:00", "00:39:00", "00:40:00", "00:42:00",
"00:43:00", "00:44:00", "00:46:00", "00:47:00", "00:48:00", "00:50:00",
"00:51:00", "00:52:00", "00:54:00", "00:55:00", "00:56:00", "00:58:00",
"00:59:00", "01:00:00", "01:02:00", "01:03:00", "01:04:00", "01:06:00"
), glob.rad = c(-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6,
-1.6, -1.6, -1.6, -1.6, -1.6, -1.5, -1.5, -1.4, -1.4, -1.3, -1.3,
-1.3, -1.3, -1.3, -1.3, -1.4, -1.4, -1.5, -1.5, -1.6, -1.6, -1.6,
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6,
-1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6, -1.6)), .Names = c("date",
"time", "glob.rad"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50"))
#> head(tab)
# date time glob.rad
#1 2014/07/19 00:00:00 -1.6
#2 2014/07/19 00:02:00 -1.6
#3 2014/07/19 00:03:00 -1.6
#4 2014/07/19 00:04:00 -1.6
#5 2014/07/19 00:06:00 -1.6
#6 2014/07/19 00:07:00 -1.6
library(lubridate)
library(dplyr)
tab$date <- ymd_hms(paste(tab$date, tab$time))
tab$hour <- hour(tab$date)
#head(tab)
tab%>%
group_by(hour)%>%
summarise(avg=mean(glob.rad, na.rm=T))
#Source: local data frame [2 x 2]
#
# hour avg
#1 0 -1.533333
#2 1 -1.600000
If you want to summarise glob.rad by day-and-hour, and for simplicity, you could create a new variable extracting day from your date column.
tab$day <- day(tab$date)
and add it to your grouping
tab%>%
group_by(day, hour)%>%
summarise(avg=mean(glob.rad, na.rm=T))
Source: local data frame [2 x 3]
Groups: day
day hour avg
1 19 0 -1.533333
2 19 1 -1.600000
sessionInfo()
#R version 3.2.2 (2015-08-14)
#...
#other attached packages:
#[1] lubridate_1.3.3 dplyr_0.4.2
Upvotes: 1