Reputation: 625
I have data stored in long format:
datetime <- c("2016-04-03 00:00:00", "2016-04-03 00:15:00", "2016-04-03 00:30:00",
"2016-04-03 00:00:00", "2016-04-03 00:15:00", "2016-04-03 00:30:00")
temp <- c(46.7, 47.8, 46.2, 72.0, 71.1, 70.6)
humid <- c(63.8, 60.3, 64.4, 24.1, 24.2, 24.3)
id <- c("RSOSW_16", "RSOSW_16", "RSOSW_16", "RSOSE_16", "RSOSE_16", "RSOSE_16")
df <- data.frame(datetime,temp,humid,id)
I would like to average the replicates I have by identifying the replicates by id and averaging the values under those ids by date time. Here I have two replicates. The outcome I would like is the following:
datetime temp humid id
2016-04-03 00:00:00 59.35 43.95 RSOSW_16
2016-04-03 00:15:00 59.45 42.25 RSOSW_16
2016-04-03 00:30:00 58.4 44.35 RSOSW_16
I have several other columns of information that I would like to keep in the data frame. Thanks in advance.
Upvotes: 1
Views: 117
Reputation: 887223
We can use summarise_each
in dplyr
library(dplyr)
as.data.frame(df, stringsAsFactors=FALSE) %>%
group_by(datetime) %>%
summarise_each(funs(mean(as.numeric(.))), temp, humid)
# datetime temp humid
# <chr> <dbl> <dbl>
#1 2016-04-03 00:00:00 59.35 43.95
#2 2016-04-03 00:15:00 59.45 42.25
#3 2016-04-03 00:30:00 58.40 44.35
Upvotes: 0
Reputation: 31452
Using the data.table
package, we can group averages by datetime and id by:
dt=data.table(datetime, temp,humid,id)
dt[, .(temp = mean(temp), humid=mean(humid)), by=.(datetime,id)]
# datetime id temp humid
#1: 2016-04-03 00:00:00 RSOSW_16 46.7 63.8
#2: 2016-04-03 00:15:00 RSOSW_16 47.8 60.3
#3: 2016-04-03 00:30:00 RSOSW_16 46.2 64.4
#4: 2016-04-03 00:00:00 RSOSE_16 72.0 24.1
#5: 2016-04-03 00:15:00 RSOSE_16 71.1 24.2
#6: 2016-04-03 00:30:00 RSOSE_16 70.6 24.3
Or, grouping just by datetime, we get
dt[, .(temp = mean(temp), humid=mean(humid)), by=.(datetime)]
# datetime temp humid
#1: 2016-04-03 00:00:00 59.35 43.95
#2: 2016-04-03 00:15:00 59.45 42.25
#3: 2016-04-03 00:30:00 58.40 44.35
Upvotes: 1
Reputation: 23200
If you want to ignore the ID's (this will match the expected result you gave):
df <- as.data.frame(cbind(datetime,temp,humid,id))
df$temp <- as.numeric(as.character(df$temp))
df$humid <- as.numeric(as.character(df$humid))
aggregate(df[,colnames(df) %in% c("temp", "humid")],by=list(df$datetime),mean)
Group.1 temp humid 1 2016-04-03 00:00:00 59.35 43.95 2 2016-04-03 00:15:00 59.45 42.25 3 2016-04-03 00:30:00 58.40 44.35
If you want to distinguish between ID's (this will not match the expected result you gave because that result didn't separate ROSW_16
and ROSE_16
):
aggregate(df[,colnames(df) %in% c("temp", "humid")],by=list(df$id, df$datetime),mean)
Group.1 Group.2 temp humid 1 RSOSE_16 2016-04-03 00:00:00 72.0 24.1 2 RSOSW_16 2016-04-03 00:00:00 46.7 63.8 3 RSOSE_16 2016-04-03 00:15:00 71.1 24.2 4 RSOSW_16 2016-04-03 00:15:00 47.8 60.3 5 RSOSE_16 2016-04-03 00:30:00 70.6 24.3 6 RSOSW_16 2016-04-03 00:30:00 46.2 64.4
Per a comment by @thelatemail, the formula interface may be neater:
aggregate(cbind(temp,humid) ~ datetime, data=df, mean)
or just
aggregate(df[,c("temp", "humid")], df["datetime"], mean)
Upvotes: 3