Heng
Heng

Reputation: 41

Time format and calculations in R

I have the following dataframe:

enter image description here

I have several questions on how to format and work with the dataframe (all my columns are character type of variables).

First, is there a way to reformat the Start column to a time type variable from a character variable.

Just like as.Date to format date, is there an as.time function in R ?

Second, I have two values in my Start and Stop columns that are not properly displayed as a 24 hour time format, is there any way to change them to the following :

24:46:00 to 0:46:00 and 25:14:00 to 01:14:00

Third, I would like to merge the date and Start column to one column as the following:

2019-05-12 18:11

I have tried the following: but it gives me N/As as value for the new variable.

df$DT <- as.POSIXct(paste(df$Date, df$Start), format="%d-%m-%Y %H:%M:%S")

Last I would like to calculate the difference between the start and stop time, I have done the following and it returns me with an error message :

df$Time_diff = difftime(df$Stop, df$Start, units = "mins")

the error message are the following:

Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format

Could anyone help me with the above four points (I would like to write the code with base R without importing any library)?

Thank you !

Upvotes: 1

Views: 426

Answers (2)

Andre Wildberg
Andre Wildberg

Reputation: 19191

Alternative solution, using base R:

a
# [1,] "25:45:00" "17:34" "2020-09-17"
# [2,] "13:43"    "13:34" "2019-04-12"
# [3,] "17:56:00" "12:45" "2019-11-03"

b <- data.frame( t(apply(
 a[,1:2], 1, function(x) sapply(
 x, function(y) if(as.numeric(substr(y,1,2)) > 23){
 paste0(as.numeric(substr(y,1,2))-24,":",as.numeric(substr(y,4,5)) ) }
 else{ y } ) )), a[,3] )
#         V1     a     a...3.
# 1     1:45 17:34 2020-09-17
# 2    13:43 13:34 2019-04-12
# 3 17:56:00 12:45 2019-11-03

cc <- data.frame( Start=as.POSIXlt(strptime( paste( b[,3], b[,1] ), "%Y-%m-%d %H:%M" )),
 Stop=as.POSIXlt(strptime( paste( b[,3], b[,2] ), "%Y-%m-%d %H:%M" ) ) )
#                 Start                Stop
# 1 2020-09-17 01:34:00 2020-09-17 17:34:00
# 2 2019-04-12 13:43:00 2019-04-12 13:34:00
# 3 2019-11-03 17:56:00 2019-11-03 12:45:00
diff.Date(cc)
#        Stop
# 1  960 mins
# 2   -9 mins
# 3 -311 mins

Converting only a time ("17:45") to as.Date ultimately always results in "recent Date" + Time.

Upvotes: 1

akrun
akrun

Reputation: 887881

There are different formats in 'Start' column. An option is to use parse_date_time for correctly parsing different formats

library(lubridate)
parse_date_time(c("2019-05-12 18:11", "2019-10-20 24:46:00"),
       c("ymd HMS", "ymd HM"))
#[1] "2019-05-12 18:11:00 UTC" "2019-10-21 00:46:00 UTC"

Using the OP's code

strt <- with(df, parse_date_time(paste(Date, Start), c("ymd HMS", "ymd HM")))
stop <- with(df, parse_date_time(paste(Date, Stop), c("ymd HMS", "ymd HM")))

and now we do the difftime

difftime(stop, strt, units = "mins")

Or another option is anytime

library(anytime)
anytime(c("2019-05-12 18:11", "2019-10-20 24:46:00"))

If there are only two formats for the 'Start', 'Stop' column, then we can do this in two steps

fmt1 <- "%d-%m-%Y %H:%M:%S"
fmt2 <- "%d-%m-%Y %H:%M" 
strt <- with(df, as.POSIXct(paste(Date, Start), format = fmt1))
i1 <- is.na(strt)
str1[i1] <- with(df, as.POSIXct(paste(Date, Start)[i1], format = fmt2))
stop <- with(df, as.POSIXct(paste(Date, Stop), format = fmt1))
i2 <- is.na(stop)
stop[i2] <- with(df, as.POSIXct(paste(Date, Stop[i2], format = fmt2))

Update

For the second part of the question, ie. to convert to 24 hour period

str1 <- "25:14:00"
sapply(strsplit(str1, ":"), function(x) {
     x1 <- as.numeric(x)
     i1 <- x[1] >= 24
      x1[1][i1] <- x1[1][i1] - 24
      do.call(sprintf, c(as.list(x1), fmt = "%02d:%02d:%02d"))
  })
#[1] "01:14:00"

Upvotes: 1

Related Questions