Fern
Fern

Reputation: 41

Averages by Time in R

I have compounds concentrations measured every second. I want to make 30 and 60 seconds averages. I have been reading the posts here, I tried lubridate and dplyr. But no luck. I am trying to make this work, but I have not been able to do so. i am transitioning from SAS to R, so please be patient.

This is my data:

head (data)#show the first 6 rows

   Date     Time  Temp      Appb    Bppb    Cppb     Dppb    Eppb      Fppb

1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340   
2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230  
3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607  
4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383  
5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002  
6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064   

Upvotes: 3

Views: 1253

Answers (3)

Gautam
Gautam

Reputation: 2753

Here is a data.table and lubridate approach for completeness.

library(data.table)
library(lubridate)

dat <- read.table(text = "Date     Time  Temp      Appb    Bppb    Cppb     Dppb    Eppb      Fppb
                          1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340   
                          2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230  
                          3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607  
                          4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383  
                          5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002  
                          6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064   ",
                  header = T, stringsAsFactors = F)

#convert to R date object 
dat$tme <- as.POSIXct(strptime(paste(dat$Date, dat$Time), format = "%m/%d/%y %H:%M:%S"), tz = "America/Montreal")

#convert to data.table
dat <- as.data.table(dat)

#drop Date and Time since we have an R date object now
dat <- dat[,-c(1,2)]

#result
dat[, lapply(.SD, mean), .(tme = round_date(tme, "3 seconds"))]

I rounded to 3 seconds since the sample data was all within 30 seconds (same as the above answer).

Here are the results:

    tme     Temp      Appb     Bppb     Cppb      Dppb     Eppb       Fppb
1: 2017-10-30 21:32:33 25.23000 -0.862542 22.13465 34.91785 -19.69470 49.74160 -2.5092850
2: 2017-10-30 21:32:36 25.23333 -0.820978 21.22863 25.85507 -19.04883 39.02053 -0.5313307
3: 2017-10-30 21:32:39 25.24000 -1.895730 21.53450 18.05760 -17.25390 31.74480 -0.3110640

I personally prefer the data.table approach especially for larger datasets due to its speed and how convenient it is to subset and perform operations.

Upvotes: 1

Adiel Loinger
Adiel Loinger

Reputation: 199

Well, you can do the following:

data$time_bucket <- 
  as.POSIXct(round(as.numeric(as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%y %H:%M:%S"))/30)*30, origin='1970-01-01')

This might seem a bit involved but it does the following:

  1. as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%y %H:%M:%S") paste together the date and time columns to create one "datetime" object.
  2. as.numeric converts it to "epoch" number - number of seconds since 1970-01-01
  3. Divide by 30, round and multiply by 30 - this will created buckets of 30 seconds. All times that round to the same number will have the same "label" after the rounding.
  4. Finally convert it to "datetime" using as.POSIXct.

After you have done all this you can just take the average by time bucket, for example using dplyr:

data %>% group_by(time_bucket) %>%
  summarize(mean(Temp))

Hope this answers your question.

Upvotes: 4

acylam
acylam

Reputation: 18691

Here is another solution with period.apply from xts:

library(lubridate)
library(xts)

data_ts = as.xts(data[-c(1:2)], mdy_hms(paste(data$Date, data$Time)))

ep = endpoints(data_ts, 'seconds', k = 30)

period.apply(data_ts, ep, FUN = mean)

Result:

                        Temp      Appb     Bppb     Cppb      Dppb    Eppb      Fppb
2017-10-30 21:32:38 25.23333 -1.013958 21.58162 27.57642 -18.96497 41.3816 -1.153938

Since all your sample data is within 30 seconds, you only get one average for each column. To verify that my answer actually works, you can try a 2-second mean:

test_ep = endpoints(data_ts, 'seconds', k = 2)

period.apply(data_ts, test_ep, FUN = mean)

Result:

                      Temp       Appb     Bppb    Cppb     Dppb     Eppb       Fppb
2017-10-30 21:32:33 25.230 -0.4693040 22.44450 35.5993 -18.4843 52.04880 -2.9473400
2017-10-30 21:32:35 25.230 -1.0125065 21.49190 32.4128 -20.9199 45.02025 -1.5314185
2017-10-30 21:32:37 25.235 -0.8468505 21.26345 23.4880 -18.1059 37.22775 -0.3011925
2017-10-30 21:32:38 25.240 -1.8957300 21.53450 18.0576 -17.2539 31.74480 -0.3110640

Data:

data = read.table(text = "   Date     Time  Temp      Appb    Bppb    Cppb     Dppb    Eppb      Fppb
                  1 10/30/17 21:32:33 25.23 -0.469304 22.4445 35.5993 -18.4843 52.0488 -2.947340   
                  2 10/30/17 21:32:34 25.23 -1.255780 21.8248 34.2364 -20.9051 47.4344 -2.071230  
                  3 10/30/17 21:32:35 25.23 -0.769233 21.1590 30.5892 -20.9347 42.6061 -0.991607  
                  4 10/30/17 21:32:36 25.23 -0.874262 21.3353 25.4841 -19.6127 38.3224 -0.452383  
                  5 10/30/17 21:32:37 25.24 -0.819439 21.1916 21.4919 -16.5991 36.1331 -0.150002  
                  6 10/30/17 21:32:38 25.24 -1.895730 21.5345 18.0576 -17.2539 31.7448 -0.311064", 
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 0

Related Questions