Convert hourly data to daily and bi-daily in groups of elements

Question

I know that this question is not new, but my case includes some characteristics that the previous replies cannot fully address.

I have a very big dataframe in R called 'df' (incl. 14 million elements) with the following format:

            ID               datetime    measurem
     1:    1459   2013-01-08 00:00:00        2.24
     2:    1459   2013-01-08 01:00:00        2
     3:    1459   2013-01-08 02:00:00        2.54
     4:    1459   2013-01-08 03:00:00        3.98
     5:    1459   2013-01-08 04:00:00        2
     6:    1459   2013-01-08 05:00:00        2
     7:    1459   2013-01-08 06:00:00        3
             ....
  1007:    2434   2013-01-08 00:00:00        3.45
  1008:    2434   2013-01-08 01:00:00        3
  1009:    2434   2013-01-08 02:00:00        4
  1010:    2434   2013-01-08 03:00:00        5.01
  1011:    2434   2013-01-08 04:00:00        4
            ....
  3245:    4780   2013-01-10 00:00:00        3
  3246:    4780   2013-01-10 01:00:00        4.73
  3247:    4780   2013-01-10 02:00:00        3

The structure of df is the following:

Classes ‘data.table’ and 'data.frame': 14103024 obs. of 3 variables: $ ID: chr "1459" "1459" ... $ datetime : POSIXct, format: "2013-01-08 00:00:00" "2013-01-08 01:00:00" ... $ measurem: num 2.24 2 2.54 ...

I would like to convert the energy data 'measurem' first to daily taking the sum and then to bi-daily (one measurement until 12am and the other one until 12pm), while keeping the ID column and the date. As the full dataframe is too big, I would appreciate any suggestion that could work relatively fast.

Thank you in advance!

PKumar · Accepted Answer

If I understood you correctly, then I guess you want to summarise the "measurem" column basis on ID,date and AM/PM, Since there is no sample data in the question,I have made my own to make the solution:

DATA:

 set.seed(1234)
df <- data.frame(ID=rep(1:5,4),datetime=c("2013-01-08 00:00:00", "2013-01-09 01:00:00", "2013-01-09 13:00:00", "2013-01-08 02:00:00", "2013-01-08 15:00:00",
                                         "2013-01-08 16:00:00", "2013-01-09 01:00:00", "2013-01-09 02:00:00", "2013-01-08 03:00:00", "2013-01-09 18:00:00",
                                         "2013-01-08 14:00:00", "2013-01-09 19:00:00", "2013-01-08 11:00:00", "2013-01-09 10:00:00", "2013-01-08 18:00:00",
                                         "2013-01-09 19:00:00", "2013-01-09 03:00:00", "2013-01-09 02:00:00", "2013-01-09 21:00:00",
                                         "2013-01-09 11:00:00"),measurement=abs(rnorm(20)))

Solution:

datetime <- as.POSIXlt(df$datetime)
date <- as.Date(datetime)
ind <- ifelse(datetime$hour >= 12,"PM","AM")
df$ind <- ind
df$date <- date

1) data.table way:

library(data.table)
dt <- setDT(df)
dt[,list(count = .N,sum_measure = sum(measurement)),by=list(ID,date,ind)]

2) Base R way:

fin <- aggregate(measurement ~ ID + ind + date,data=df,sum)
fin[order(fin$ID),]



 ID ind       date measurement
#  1  AM 2013-01-08  1.20706575
#  1  PM 2013-01-08  0.98324859
#  1  PM 2013-01-09  0.11028549
#  2  AM 2013-01-09  1.36317871
#  2  PM 2013-01-09  0.99838644
#  3  AM 2013-01-08  0.77625389
#  3  AM 2013-01-09  1.45782727
#  3  PM 2013-01-09  1.08444118
#  4  AM 2013-01-08  2.91014970
#  4  AM 2013-01-09  0.06445882
#  4  PM 2013-01-09  0.83717168
#  5  PM 2013-01-08  1.38861875
#  5  AM 2013-01-09  2.41583518
#  5  PM 2013-01-09  0.89003783

Convert hourly data to daily and bi-daily in groups of elements

Answers (2)

Data

Benchmark

Related Questions