Reputation: 8247
I have following dataframe in R
Date ID
01-01-2017 12:39:00 CDF
01-01-2017 01:39:00 WED
01-01-2017 02:39:00 QWE
01-01-2017 05:39:00 TYU
01-01-2017 17:39:00 ERT
02-01-2017 02:30:34 DEF
I want to calculate hourly count of ID. My desired dataframe would be
Date hours Count
01-01-2017 00:00 - 01:00 1
01-01-2017 01:00 - 02:00 1
01-01-2017 02:00 - 03:00 1
01-01-2017 03:00 - 04:00 0
01-01-2017 04:00 - 05:00 0
01-01-2017 05:00 - 06:00 1
.
01-01-2017 23:00 - 00:00 0
.
02-01-2017 12:00 - 01:00 0
02-01-2017 01:00 - 02:00 0
02-01-2017 02:00 - 03:00 1
Where there is no id exists,I want hourly bucket to be zero. Every date will contain 24 hours movement.
How can I achieve this in R?
Upvotes: 1
Views: 1353
Reputation: 14336
tidyverse
provides some functions which are useful here, like count
/tally
and complete
library(tidyverse)
library(lubridate)
dat <- read_csv('Date, ID
01-01-2017 12:39:00, CDF
01-01-2017 01:39:00, WED
01-01-2017 02:39:00, QWE
01-01-2017 05:39:00, TYU
01-01-2017 17:39:00, ERT
02-01-2017 02:30:34, DEF'
)
dat %>%
mutate(
Date = dmy_hms(Date),
day = floor_date(Date, 'day'),
hour = hour(Date)
) %>%
group_by(day, hour) %>%
tally %>%
complete(day, hour = 0:23, fill = list('n' = 0))
## A tibble: 48 x 3
## Groups: day [2]
# day hour n
# <dttm> <int> <dbl>
# 1 2017-01-01 0 0
# 2 2017-01-01 1 1
# 3 2017-01-01 2 1
# 4 2017-01-01 3 0
# 5 2017-01-01 4 0
# 6 2017-01-01 5 1
# 7 2017-01-01 6 0
# 8 2017-01-01 7 0
# 9 2017-01-01 8 0
#10 2017-01-01 9 0
## ... with 38 more rows
Upvotes: 1
Reputation: 2229
here is one approach using lubridate
and base
R.
In your provided data set, your first observation is 01-01-2017 12:39:00
, but in your desired output there is a count for 00:00 - 01:00
. In the code below,
12:39:00
would be regarded as 12:39 P.M, therefore I will assume you mean
00:39:00
. Let me know if that is not the case
library(lubridate)
# the data
txt <- "Date,ID
01-01-2017 00:39:00,CDF
01-01-2017 01:39:00,WED
01-01-2017 02:39:00,QWE
01-01-2017 05:39:00,TYU
01-01-2017 17:39:00,ERT
02-01-2017 02:30:34,DEF"
df <- read.table(text = txt,sep = ",", header = TRUE)
# transforming the date strings into dates
dates <- as.POSIXct(strptime(df$Date, "%d-%m-%Y %H:%M:%S"))
# creating an hourly time sequence from start to end
total_time <- seq(from = floor_date(min(dates), "hour"), to =
ceiling_date(max(dates), "hour"), by = "hour")
# in case there is more than one occurrence per interval
count <- sapply(total_time, function(x) {
sum(floor_date(dates,"hour") %in% x) })
data.frame(Date = strftime(total_time, format = "%d-%m-%Y"),
hours = paste(strftime(total_time, format = "%H:%M"),
strftime(total_time + 60*60, format="%H:%M"),
sep = " - "),
Count = count)
# Date hours Count
# 1 01-01-2017 00:00 - 01:00 1
# 2 01-01-2017 01:00 - 02:00 1
# 3 01-01-2017 02:00 - 03:00 1
# 4 01-01-2017 03:00 - 04:00 0
# 5 01-01-2017 04:00 - 05:00 0
# 6 01-01-2017 05:00 - 06:00 1
# 7 01-01-2017 06:00 - 07:00 0
Upvotes: 1