Jahnab Kumar Deka
Jahnab Kumar Deka

Reputation: 109

average time in a column in hr:min:sec format

I am trying to identify the average time in a column of a data frame in hr:min:sec format using R. But no luck. Help is much appreciated. Data sample is as below:

Col_Time
03:08:20
03:11:30
03:22:18
03:27:39

My output should be one record which is average of all the numbers in same format like the input column.

Thanks

Upvotes: 4

Views: 7444

Answers (4)

V. van Hees
V. van Hees

Reputation: 81

Note that a common requirement with aggregating time is that time is treated continuously, which can mean that we want 23:50:00 and 0:10:00 to average to 0:00 instead of 12:00.

All answers above are focussed on the scenario that time is continuous around noon and not around midnight.

If I set:

Col_Time = c("23:50:05", "0:10:05")

then

library(chron)
median(chron::times(Col_Time))

gives 12:00:05

library(lubridate)
seconds_to_period(mean(period_to_seconds(hms(Col_Time))))

gives "12H 0M 5S"

format(mean(strptime(Col_Time, "%H:%M:%S")), "%H:%M:%S")

gives "12:00:05"

If you want to avoid external dependencies then I think a bit more coding is necessary. See below a function to impute missing timestamps in Col_Time:

imputeTime = function(Col_Time) {
  valid = which(Col_Time != "")
  invalid = which(Col_Time == "")
  if (length(valid) > 0 && length(invalid) > 0) {
    # Express time as hour since the previous noon
    timeHR = rep(NA, length(valid))
    for (i in 1:length(valid)) {
      time = as.numeric(unlist(strsplit(Col_Time[valid[i]],":")))
      time[1] = ifelse(time[1] <= 12, yes = time[1] + 24, no = time[1])
      timeHR[i] = sum(time / c(1, 60, 3600))
    }
    # Calculate median
    med_time = median(timeHR)
    # Convert back to clock time
    med_time = ifelse(med_time >= 24, yes = med_time - 24, no = med_time)
    hr = floor(med_time)
    min = floor((med_time - hr) * 60)
    sec = round((med_time - (hr + (min / 60))) * 3600, digits = 2)
    newvalue = format(strptime(x = paste0(hr, ":", min, ":", sec), "%H:%M:%S"), "%H:%M:%S")
    
    # impute missing values       
    Col_Time[invalid] = newvalue
  }
  return(Col_Time)
}

Col_Time = c("23:50:05", "", "0:10:05")
Col_Time = imputeTime(Col_Time)

which gives "23:50:05" "00:00:05" "0:10:05"

Upvotes: 0

akrun
akrun

Reputation: 887891

We can do this without using any external packages

format(mean(strptime(Col_Time, "%H:%M:%S")), "%H:%M:%S")
#[1] "03:17:26"

data

Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389265

One way with lubridate package. We convert the time into seconds, take the mean of it and then convert those seconds to time again.

Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
library(lubridate)
seconds_to_period(mean(period_to_seconds(hms(Col_Time))))
#[1] "3H 17M 26.75S"

Upvotes: 5

d.b
d.b

Reputation: 32558

You can use chron library. Specifically, times function. Note that times internally represents time as a numeric value (decimal days).

Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
library(chron)
mean(times(Col_Time))
#[1] 03:17:27

Upvotes: 11

Related Questions