Blaze Tama
Blaze Tama

Reputation: 10948

R Sum rows by hourly rate

I'm getting started with R, so please bear with me

For example, I have this data.table (or data.frame) object :

Time               Station      count_starts    count_ends
01/01/2015 00:30       A            2               3
01/01/2015 00:40       A            2               1
01/01/2015 00:55       B            1               1
01/01/2015 01:17       A            3               1
01/01/2015 01:37       A            1               1

My end goal is to group the "Time" column to hourly and sum the count_starts and count_ends based on the hourly time and station :

Time               Station      sum(count_starts)   sum(count_ends)
01/01/2015 01:00       A            4                      4
01/01/2015 01:00       B            1                      1
01/01/2015 02:00       A            4                      2

I did some research and found out that I should use the xts library.

Thanks for helping me out

UPDATE :

I converted the type of transactions$Time to POSIXct, so the xts package should be able to use the timeseries directly.

Upvotes: 0

Views: 1132

Answers (3)

Rohit Mishra
Rohit Mishra

Reputation: 571

There are basically two things required: 1) round of the Time to nearest 1 hour window:

library(data.table)

library(lubridate)

data=data.table(Time=c('01/01/2015 00:30','01/01/2015 00:40','01/01/2015 00:55','01/01/2015 01:17','01/01/2015 01:37'),Station=c('A','A','B','A','A'),count_starts=c(2,2,1,3,1),count_ends=c(3,1,1,1,1))

data[,Time_conv:=as.POSIXct(strptime(Time,'%d/%m/%Y %H:%M'))]

data[,Time_round:=floor_date(Time_conv,unit="1 hour")]

2) List the data table obtained above to get the desired result:

New_data=data[,list(count_starts_sum=sum(count_starts),count_ends_sum=sum(count_ends)),by='Time_round']

Upvotes: 0

Onyambu
Onyambu

Reputation: 79338

Using base R, we can still do the above. Only that the hour will be one less for all of them:

 dat=read.table(text = "Time               Station      count_starts    count_ends
 '01/01/2015 00:30'       A            2               3
 '01/01/2015 00:40'       A            2               1
 '01/01/2015 00:55'       B            1               1
 '01/01/2015 01:17'       A            3               1
 '01/01/2015 01:37'       A            1               1",
             header = TRUE, stringsAsFactors = FALSE)

 dat$Time=cut(strptime(dat$Time,"%m/%d/%Y %H:%M"),"hour")
 aggregate(.~Time+Station,dat,sum)
                  Time Station count_starts count_ends
 1 2015-01-01 00:00:00       A            4          4
 2 2015-01-01 01:00:00       A            4          2
 3 2015-01-01 00:00:00       B            1          1

You can use the order function to rearrange the table or even the sort.POSIXlt function:

 m=aggregate(.~Time+Station,dat,sum)
 m[order(m[,1]),]
                  Time Station count_starts count_ends
 1 2015-01-01 00:00:00       A            4          4
 3 2015-01-01 00:00:00       B            1          1
 2 2015-01-01 01:00:00       A            4          2

Upvotes: 1

www
www

Reputation: 39174

A solution using dplyr and lubridate. The key is to use ceiling_date to convert the date time column to hourly time-step, and then group and summarize the data.

library(dplyr)
library(lubridate)

dt2 <- dt %>%
  mutate(Time = mdy_hm(Time)) %>%
  mutate(Time = ceiling_date(Time, unit = "hour")) %>%
  group_by(Time, Station) %>%
  summarise(`sum(count_starts)` = sum(count_starts),
            `sum(count_ends)` = sum(count_ends)) %>%
  ungroup()
dt2
# # A tibble: 3 x 4
#                  Time Station `sum(count_starts)` `sum(count_ends)`
#                <dttm>   <chr>               <int>             <int>
# 1 2015-01-01 01:00:00       A                   4                 4
# 2 2015-01-01 01:00:00       B                   1                 1
# 3 2015-01-01 02:00:00       A                   4                 2

DATA

dt <- read.table(text = "Time               Station      count_starts    count_ends
'01/01/2015 00:30'       A            2               3
'01/01/2015 00:40'       A            2               1
'01/01/2015 00:55'       B            1               1
'01/01/2015 01:17'       A            3               1
'01/01/2015 01:37'       A            1               1",
                 header = TRUE, stringsAsFactors = FALSE)

Explanation

mdy_hm is the function to convert the string to date-time class. It means "month-day-year hour-minute", which depends on the structure of the string. ceiling_date rounds a date-time object up based on the unit specified. group_by is to group the variable. summarise is to conduct summary operation.

Upvotes: 1

Related Questions