Tamra.y
Tamra.y

Reputation: 245

How to get a diff of two integer variables with the formatting hh:mm:ss:00 in R

I have the below data frame with two time variables , I want to get the diff of these 2 variables in the below format hh:mm:ss

Example : The diff of 3000000 - 3054900 should be 00:05:49 (hh:mm:ss)

job_start <- c(3000000,11384800,11565400,6434100)
job_end   <- c(3054900,12091700,12102900,8232200)
data <- data.frame(job_start ,job_end)

I tried :

data$newdiff <-  data$job_end - data$job_start
data$newdiff2 <- format(strptime(data$newdiff, format="%H%M%S"), format = "%H:%M:%S")

head(data)

It's not giving me the expected results.How can I get the diff for these 2 variables which results of showing of real time diff. When I simply deduct these variables it is not giving me the right format hh:mm:ss.

Upvotes: 1

Views: 80

Answers (2)

hannes101
hannes101

Reputation: 2528

I also came up with a solution, which should at least be a starting point. I basically just pad all strings to the same length and then extract the positions, add additional zeros and then convert it using the lubridate package.

library(data.table)
library(stringr)
library(lubridate)

dt.data <- data.table(data)
dt.data[, .(
          job_start
         , job_end
         , job.diff.hour = str_pad(str_sub(str_pad((job_end - job_start),7 , 0, side ="right"), start = -7, end = -7), width = 2, pad = "0", side = "left")
         , job.diff.minute = str_sub(str_pad((job_end - job_start),7 , 0, side = "right"), start = -6, end = -5) 
         , job.diff.second = str_sub(str_pad((job_end - job_start),7 , 0, side = "right"), start = -4, end = -3) 
         , job.diff.millisecond = str_sub(str_pad((job_end - job_start),7 , 0, side = "right"), start = -2)
         , job.diff = job_end - job_start)][, .( Duration = hms(paste(job.diff.hour
                                                           , job.diff.minute
                                                           , job.diff.second
, sep = ":"))
            )]
     Duration
1:  5H 49M 0S
2:  7H 6M 90S
3: 5H 37M 50S
4: 1H 79M 81S

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50718

First off, I agree with @hannes101's comment that fixing the method that generates these pseudo-timestamps is the safer and cleaner way to go.

That aside, it's not difficult to write a parser that converts the difference between job_end and job_start into a string of format "HH:MM:SS".

Here is vectorised base R approach

parse.time <- Vectorize(function(x) {
    idx <- rev(seq(nchar(x) - 1, 0, by = -2)[-1])
    hms <- as.integer(sapply(idx, function(i) substr(x, i, i + 1)))
    hms <- c(rep(0, 3 - length(hms)), hms)
    for (i in length(hms):(length(hms) - 1)) {
        while (hms[i] > 60) {
            hms[i] <- hms[i] - 60
            hms[i - 1] <- hms[i - 1] + 1
        }
    }
    return(sprintf("%02i:%02i:%02i", hms[1], hms[2], hms[3]))
})

parse.time(data$job_end - data$job_start)
#[1] "00:05:49" "01:11:09" "00:54:15" "02:20:21"

Sample data

job_start <- c(3000000,11384800,11565400,6434100)
job_end   <- c(3054900,12091700,12102900,8232200)
data <- data.frame(job_start ,job_end)

Upvotes: 1

Related Questions