Ayushi Kachhara
Ayushi Kachhara

Reputation: 159

Datetime/ Day manipulation in R

I have a regular 5 minute interval datetime data sets (about 50). POSIXt/ lubridate functions convert my datetime very nicely to a 24 hour format as required. But I would like to add another column with my day's definition to be from 6 am to 6 am (which is currently midnight to midnight). I am trying to do this to capture after 12AM activity as a part of current date rather than the next one.

I am currently trying to create a group every 288th row (there are 288 5minute intervals in a day). But it creates a problem because my datasets don't necessarily start at a unique time.

I do not want to create offsets because that tampers with the values corresponding to the time.

Any efficient ways around this problem? Thank you.

Upvotes: 2

Views: 121

Answers (2)

Arani
Arani

Reputation: 823

Checkout this question and the corresponding answer: How to manipulate the time part of a date column?

It illustrates a more robust solution as it is independent of your data structure (e.g. repeatition).

Following @meenaparam's solution:

Convert all date columns to dmy_hms format from lubridate package. Please explore other options like dmy_hm or ymd_hms etc, as per your specific need.

mutate(DATE = dmy_hms(DATE))

Now create a column to identify the data points that need to be modified in different ways. Like your data points with 00:00:00 to 05:59:59 (hms) needs to be part of the previous date.

DAY_PAST = case_when(hour(DATE) < 6 ~ "yup", TRUE ~ "nope"))

Now convert the day value of these "yup" dates to day(DATE)-1

NEW_DATE = case_when(DAY_PAST == "yup"
      ~ make_datetime(year(DATE-86400), month(DATE-86400), day = day(DATE-86400), hour = hour(DATE)),
      TRUE ~ DATE)

.

Upvotes: 1

r2evans
r2evans

Reputation: 160407

You can efficiently do it by first generating a sequence of date/times, then using cut to find the bin in which each value falls:

set.seed(2)
dat <- Sys.time() + sort(runif(10, min=0, max=5*24*60*60))
dat
#  [1] "2017-07-29 15:43:10 PDT" "2017-07-29 20:23:12 PDT" "2017-07-29 22:24:22 PDT" "2017-07-31 08:22:57 PDT"
#  [5] "2017-07-31 18:13:06 PDT" "2017-07-31 21:01:10 PDT" "2017-08-01 12:30:19 PDT" "2017-08-02 04:14:03 PDT"
#  [9] "2017-08-02 17:26:14 PDT" "2017-08-02 17:28:52 PDT"
sixs <- seq(as.POSIXct("2017-07-29 06:00:00", tz = "UTC"), as.POSIXct("2017-08-03 06:00:00", tz = "UTC"), by = "day")
sixs
# [1] "2017-07-29 06:00:00 UTC" "2017-07-30 06:00:00 UTC" "2017-07-31 06:00:00 UTC" "2017-08-01 06:00:00 UTC"
# [5] "2017-08-02 06:00:00 UTC" "2017-08-03 06:00:00 UTC"
cut(dat, sixs, label = FALSE)
#  [1] 1 1 1 3 3 3 4 5 5 5

According to the help page (?seq.POSIXt), you might choose by="DSTday" instead.

Upvotes: 1

Related Questions