phaser
phaser

Reputation: 625

Averaging replicates by id and date

I have data stored in long format:

    datetime <- c("2016-04-03 00:00:00", "2016-04-03 00:15:00", "2016-04-03 00:30:00",
          "2016-04-03 00:00:00", "2016-04-03 00:15:00", "2016-04-03 00:30:00")
    temp <- c(46.7, 47.8, 46.2, 72.0, 71.1, 70.6)
    humid <- c(63.8, 60.3, 64.4, 24.1, 24.2, 24.3)
    id <- c("RSOSW_16", "RSOSW_16", "RSOSW_16", "RSOSE_16", "RSOSE_16", "RSOSE_16")
    df <- data.frame(datetime,temp,humid,id)

I would like to average the replicates I have by identifying the replicates by id and averaging the values under those ids by date time. Here I have two replicates. The outcome I would like is the following:

             datetime    temp   humid         id       
  2016-04-03 00:00:00   59.35   43.95   RSOSW_16
  2016-04-03 00:15:00   59.45   42.25   RSOSW_16
  2016-04-03 00:30:00    58.4   44.35   RSOSW_16

I have several other columns of information that I would like to keep in the data frame. Thanks in advance.

Upvotes: 1

Views: 117

Answers (3)

akrun
akrun

Reputation: 887223

We can use summarise_each in dplyr

library(dplyr)
as.data.frame(df, stringsAsFactors=FALSE) %>%
             group_by(datetime) %>%
             summarise_each(funs(mean(as.numeric(.))), temp, humid) 
#             datetime  temp humid
#                <chr> <dbl> <dbl>
#1 2016-04-03 00:00:00 59.35 43.95
#2 2016-04-03 00:15:00 59.45 42.25
#3 2016-04-03 00:30:00 58.40 44.35

Upvotes: 0

dww
dww

Reputation: 31452

Using the data.table package, we can group averages by datetime and id by:

dt=data.table(datetime, temp,humid,id)
dt[, .(temp = mean(temp), humid=mean(humid)), by=.(datetime,id)]
#              datetime       id temp humid
#1: 2016-04-03 00:00:00 RSOSW_16 46.7  63.8
#2: 2016-04-03 00:15:00 RSOSW_16 47.8  60.3
#3: 2016-04-03 00:30:00 RSOSW_16 46.2  64.4
#4: 2016-04-03 00:00:00 RSOSE_16 72.0  24.1
#5: 2016-04-03 00:15:00 RSOSE_16 71.1  24.2
#6: 2016-04-03 00:30:00 RSOSE_16 70.6  24.3

Or, grouping just by datetime, we get

dt[, .(temp = mean(temp), humid=mean(humid)), by=.(datetime)]
#              datetime  temp humid
#1: 2016-04-03 00:00:00 59.35 43.95
#2: 2016-04-03 00:15:00 59.45 42.25
#3: 2016-04-03 00:30:00 58.40 44.35

Upvotes: 1

Hack-R
Hack-R

Reputation: 23200

If you want to ignore the ID's (this will match the expected result you gave):

df       <- as.data.frame(cbind(datetime,temp,humid,id))
df$temp  <- as.numeric(as.character(df$temp))
df$humid <- as.numeric(as.character(df$humid))

aggregate(df[,colnames(df) %in% c("temp", "humid")],by=list(df$datetime),mean)
              Group.1  temp humid
1 2016-04-03 00:00:00 59.35 43.95
2 2016-04-03 00:15:00 59.45 42.25
3 2016-04-03 00:30:00 58.40 44.35

If you want to distinguish between ID's (this will not match the expected result you gave because that result didn't separate ROSW_16 and ROSE_16):

aggregate(df[,colnames(df) %in% c("temp", "humid")],by=list(df$id, df$datetime),mean)
   Group.1             Group.2 temp humid
1 RSOSE_16 2016-04-03 00:00:00 72.0  24.1
2 RSOSW_16 2016-04-03 00:00:00 46.7  63.8
3 RSOSE_16 2016-04-03 00:15:00 71.1  24.2
4 RSOSW_16 2016-04-03 00:15:00 47.8  60.3
5 RSOSE_16 2016-04-03 00:30:00 70.6  24.3
6 RSOSW_16 2016-04-03 00:30:00 46.2  64.4

Per a comment by @thelatemail, the formula interface may be neater:

aggregate(cbind(temp,humid) ~ datetime, data=df, mean)

or just

aggregate(df[,c("temp", "humid")], df["datetime"], mean)

Upvotes: 3

Related Questions