Reputation: 159
I have a regular 5 minute interval datetime data sets (about 50). POSIXt/ lubridate
functions convert my datetime very nicely to a 24 hour format as required. But I would like to add another column with my day's definition to be from 6 am to 6 am (which is currently midnight to midnight). I am trying to do this to capture after 12AM activity as a part of current date rather than the next one.
I am currently trying to create a group every 288th row (there are 288 5minute intervals in a day). But it creates a problem because my datasets don't necessarily start at a unique time.
I do not want to create offsets because that tampers with the values corresponding to the time.
Any efficient ways around this problem? Thank you.
Upvotes: 2
Views: 121
Reputation: 823
Checkout this question and the corresponding answer: How to manipulate the time part of a date column?
It illustrates a more robust solution as it is independent of your data structure (e.g. repeatition).
Following @meenaparam's solution:
Convert all date columns to dmy_hms
format from lubridate
package. Please explore other options like dmy_hm
or ymd_hms
etc, as per your specific need.
mutate(DATE = dmy_hms(DATE))
Now create a column to identify the data points that need to be modified in different ways. Like your data points with 00:00:00 to 05:59:59 (hms) needs to be part of the previous date.
DAY_PAST = case_when(hour(DATE) < 6 ~ "yup", TRUE ~ "nope"))
Now convert the day
value of these "yup"
dates to day(DATE)-1
NEW_DATE = case_when(DAY_PAST == "yup"
~ make_datetime(year(DATE-86400), month(DATE-86400), day = day(DATE-86400), hour = hour(DATE)),
TRUE ~ DATE)
.
Upvotes: 1
Reputation: 160407
You can efficiently do it by first generating a seq
uence of date/times, then using cut
to find the bin in which each value falls:
set.seed(2)
dat <- Sys.time() + sort(runif(10, min=0, max=5*24*60*60))
dat
# [1] "2017-07-29 15:43:10 PDT" "2017-07-29 20:23:12 PDT" "2017-07-29 22:24:22 PDT" "2017-07-31 08:22:57 PDT"
# [5] "2017-07-31 18:13:06 PDT" "2017-07-31 21:01:10 PDT" "2017-08-01 12:30:19 PDT" "2017-08-02 04:14:03 PDT"
# [9] "2017-08-02 17:26:14 PDT" "2017-08-02 17:28:52 PDT"
sixs <- seq(as.POSIXct("2017-07-29 06:00:00", tz = "UTC"), as.POSIXct("2017-08-03 06:00:00", tz = "UTC"), by = "day")
sixs
# [1] "2017-07-29 06:00:00 UTC" "2017-07-30 06:00:00 UTC" "2017-07-31 06:00:00 UTC" "2017-08-01 06:00:00 UTC"
# [5] "2017-08-02 06:00:00 UTC" "2017-08-03 06:00:00 UTC"
cut(dat, sixs, label = FALSE)
# [1] 1 1 1 3 3 3 4 5 5 5
According to the help page (?seq.POSIXt
), you might choose by="DSTday"
instead.
Upvotes: 1