Reputation: 245
I have the below data frame with two time variables , I want to get the diff of these 2 variables in the below format hh:mm:ss
Example : The diff of 3000000 - 3054900 should be 00:05:49 (hh:mm:ss)
job_start <- c(3000000,11384800,11565400,6434100)
job_end <- c(3054900,12091700,12102900,8232200)
data <- data.frame(job_start ,job_end)
I tried :
data$newdiff <- data$job_end - data$job_start
data$newdiff2 <- format(strptime(data$newdiff, format="%H%M%S"), format = "%H:%M:%S")
head(data)
It's not giving me the expected results.How can I get the diff for these 2 variables which results of showing of real time diff. When I simply deduct these variables it is not giving me the right format hh:mm:ss.
Upvotes: 1
Views: 80
Reputation: 2528
I also came up with a solution, which should at least be a starting point. I basically just pad all strings to the same length and then extract the positions, add additional zeros and then convert it using the lubridate
package.
library(data.table)
library(stringr)
library(lubridate)
dt.data <- data.table(data)
dt.data[, .(
job_start
, job_end
, job.diff.hour = str_pad(str_sub(str_pad((job_end - job_start),7 , 0, side ="right"), start = -7, end = -7), width = 2, pad = "0", side = "left")
, job.diff.minute = str_sub(str_pad((job_end - job_start),7 , 0, side = "right"), start = -6, end = -5)
, job.diff.second = str_sub(str_pad((job_end - job_start),7 , 0, side = "right"), start = -4, end = -3)
, job.diff.millisecond = str_sub(str_pad((job_end - job_start),7 , 0, side = "right"), start = -2)
, job.diff = job_end - job_start)][, .( Duration = hms(paste(job.diff.hour
, job.diff.minute
, job.diff.second
, sep = ":"))
)]
Duration
1: 5H 49M 0S
2: 7H 6M 90S
3: 5H 37M 50S
4: 1H 79M 81S
Upvotes: 1
Reputation: 50718
First off, I agree with @hannes101's comment that fixing the method that generates these pseudo-timestamps is the safer and cleaner way to go.
That aside, it's not difficult to write a parser that converts the difference between job_end
and job_start
into a string of format "HH:MM:SS"
.
Here is vectorised base R approach
parse.time <- Vectorize(function(x) {
idx <- rev(seq(nchar(x) - 1, 0, by = -2)[-1])
hms <- as.integer(sapply(idx, function(i) substr(x, i, i + 1)))
hms <- c(rep(0, 3 - length(hms)), hms)
for (i in length(hms):(length(hms) - 1)) {
while (hms[i] > 60) {
hms[i] <- hms[i] - 60
hms[i - 1] <- hms[i - 1] + 1
}
}
return(sprintf("%02i:%02i:%02i", hms[1], hms[2], hms[3]))
})
parse.time(data$job_end - data$job_start)
#[1] "00:05:49" "01:11:09" "00:54:15" "02:20:21"
job_start <- c(3000000,11384800,11565400,6434100)
job_end <- c(3054900,12091700,12102900,8232200)
data <- data.frame(job_start ,job_end)
Upvotes: 1