Reputation: 109
I am trying to identify the average time in a column of a data frame in hr:min:sec format using R. But no luck. Help is much appreciated. Data sample is as below:
Col_Time
03:08:20
03:11:30
03:22:18
03:27:39
My output should be one record which is average of all the numbers in same format like the input column.
Thanks
Upvotes: 4
Views: 7444
Reputation: 81
Note that a common requirement with aggregating time is that time is treated continuously, which can mean that we want 23:50:00 and 0:10:00 to average to 0:00 instead of 12:00.
All answers above are focussed on the scenario that time is continuous around noon and not around midnight.
If I set:
Col_Time = c("23:50:05", "0:10:05")
then
library(chron)
median(chron::times(Col_Time))
gives 12:00:05
library(lubridate)
seconds_to_period(mean(period_to_seconds(hms(Col_Time))))
gives "12H 0M 5S"
format(mean(strptime(Col_Time, "%H:%M:%S")), "%H:%M:%S")
gives "12:00:05"
If you want to avoid external dependencies then I think a bit more coding is necessary. See below a function to impute missing timestamps in Col_Time:
imputeTime = function(Col_Time) {
valid = which(Col_Time != "")
invalid = which(Col_Time == "")
if (length(valid) > 0 && length(invalid) > 0) {
# Express time as hour since the previous noon
timeHR = rep(NA, length(valid))
for (i in 1:length(valid)) {
time = as.numeric(unlist(strsplit(Col_Time[valid[i]],":")))
time[1] = ifelse(time[1] <= 12, yes = time[1] + 24, no = time[1])
timeHR[i] = sum(time / c(1, 60, 3600))
}
# Calculate median
med_time = median(timeHR)
# Convert back to clock time
med_time = ifelse(med_time >= 24, yes = med_time - 24, no = med_time)
hr = floor(med_time)
min = floor((med_time - hr) * 60)
sec = round((med_time - (hr + (min / 60))) * 3600, digits = 2)
newvalue = format(strptime(x = paste0(hr, ":", min, ":", sec), "%H:%M:%S"), "%H:%M:%S")
# impute missing values
Col_Time[invalid] = newvalue
}
return(Col_Time)
}
Col_Time = c("23:50:05", "", "0:10:05")
Col_Time = imputeTime(Col_Time)
which gives "23:50:05" "00:00:05" "0:10:05"
Upvotes: 0
Reputation: 887891
We can do this without using any external packages
format(mean(strptime(Col_Time, "%H:%M:%S")), "%H:%M:%S")
#[1] "03:17:26"
Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
Upvotes: 1
Reputation: 389265
One way with lubridate
package. We convert the time into seconds, take the mean of it and then convert those seconds to time again.
Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
library(lubridate)
seconds_to_period(mean(period_to_seconds(hms(Col_Time))))
#[1] "3H 17M 26.75S"
Upvotes: 5
Reputation: 32558
You can use chron
library. Specifically, times
function. Note that times
internally represents time as a numeric value (decimal days).
Col_Time = c('03:08:20','03:11:30','03:22:18','03:27:39')
library(chron)
mean(times(Col_Time))
#[1] 03:17:27
Upvotes: 11