user1426485
user1426485

Reputation: 153

Handling time data that goes over a day in R

I am trying to analyze a series large csv files that has data sampled at every 3 sec or so with R. One of the columns is timestamp recorded from the experiment, and the filename contains the date a particular experiment was performed.

I am trying to attach a date information to the timestamp. Naturally, that would involve just combining the date information and time information then converting it to a ymd_hms object in lubridate library in R.

Challenge here: sometimes, the experiment is performed beyond the midnight and the data file is not separated by it. Here's what I mean:

>practice[50:55, ]
   time.sub         hms hours
50 23:59:53 23H 59M 53S    23
51 23:59:55 23H 59M 55S    23
52 23:59:57 23H 59M 57S    23
53 23:59:59 23H 59M 59S    23
54    0:0:1          1S     0
55    0:0:3          3S     0

practice$hms is a result of hms(practice$time.sub), and practice$hours is a result of hours(practice$hms).

Suppose this data was obtained on 181010. I want to be able to automatically assign 181011 for the time stamp that went beyond 23:59:59.

The output I want would look like:

>after_some_smart_thing()
   time.sub         hms hours   date
50 23:59:53 23H 59M 53S    23 181010
51 23:59:55 23H 59M 55S    23 181010
52 23:59:57 23H 59M 57S    23 181010
53 23:59:59 23H 59M 59S    23 181010
54    0:0:1          1S     0 181011
55    0:0:3          3S     0 181011

The best idea I can think of at the moment is to run a for loop to compare each element of hours against the one above it, and adding 1 to the date if hour number has decreased....

Pseudo-code of that would be:

addnumber <- 0

for (i in column length){
if (hours(i) > hours(i+1)){
    addnumber <- addnumber + 1
}
date <- date + addnumber

There must a better way to deal with this and I seek for some advise in coding it in succinct manner to save computational cost. Thanks.

Upvotes: 0

Views: 56

Answers (1)

A. Suliman
A. Suliman

Reputation: 13135

Here is a short way using dplyr::lag

library(dplyr)
df %>% mutate(A=hours-lag(hours), B=if_else(is.na(A) | A!=-23,0,1), date=181010+cumsum(B==1))
  #%>% select(-A,-B) #If you don't need them

  time.sub         hms hours   A B   date
1 23:59:53 23H 59M 53S    23  NA 0 181010
2 23:59:55 23H 59M 55S    23   0 0 181010
3 23:59:57 23H 59M 57S    23   0 0 181010
4 23:59:59 23H 59M 59S    23   0 0 181010
5    0:0:1          1S     0 -23 1 181011
6    0:0:3          3S     0   0 0 181011

Upvotes: 2

Related Questions