Reputation: 5656
I have a model which predicts the duration of certain events, and measures of durations for those events. I then want to compute the difference between Predicted and Measured, the mean difference and the RMSE. I'm able to do it, but the formatting is really awkward and not what I expected:
database <- data.frame(Predicted = c(strptime(c("4:00", "3:35", "3:38"), format = "%H:%M")),
Measured = c(strptime(c("3:39", "3:40", "3:53"), format = "%H:%M")))
database
> Predicted Measured
1 2016-11-28 04:00:00 2016-11-28 03:39:00
2 2016-11-28 03:35:00 2016-11-28 03:40:00
3 2016-11-28 03:38:00 2016-11-28 03:53:00
This is the first weirdness: why does R shows me a time and a date, even if I clearly specified a time-only format (%H:%M
), and there was no date in my data to start with? It gets weirder:
database$Error <- with(database, Predicted-Measured)
database$Mean_Error <- with(database, mean(Predicted-Measured))
database$RMSE <- with(database, sqrt(mean(as.numeric(Predicted-Measured)^2)))
> database
Predicted Measured Error Mean_Error RMSE
1 2016-11-28 04:00:00 2016-11-28 03:39:00 21 mins 0.3333333 15.17674
2 2016-11-28 03:35:00 2016-11-28 03:40:00 -5 mins 0.3333333 15.17674
3 2016-11-28 03:38:00 2016-11-28 03:53:00 -15 mins 0.3333333 15.17674
Why is the variable Error
expressed in minutes? For Error
it's not a bad choice, but it becomes quite hard to read for Mean_Error
. For RMSE
it's even worse, but this could be due to the as.numeric
function: if I remove it, R complains that '^' not defined for "difftime" objects
. My questions are:
Predicted
and Measured
) shown in the %H:%M
format?Error
, Mean_Error
and RMSE
) I would like to compare a %M:%S
format and a format in only seconds, and choose among the two. Is it possible? EDIT: just to be more clear, my goal is to insert observations of time intervals into a dataframe and compute a vector of time interval differences. Then, compute some statistics for that vector: mean, RMSE, etc.. I know I could just enter the time observations in seconds, but that doesn't look very good: it's difficult to tell that 13200 seconds are 3 hours and 40 minutes. Thus I would like to be able to store the time intervals in the %H:%M
, but then be able to manipulate them algebraically and show the results in a format of my choosing. Is that possible?
Upvotes: 0
Views: 36
Reputation: 7455
We can use difftime
to specify the units for the difference in time. The output of difftime
is an object of class difftime
. When this difftime
object is coerced to numeric using as.numeric
, we can change these units (see the examples in ?difftime
):
## Note we don't convert to date-time because we just want %H:%M
database <- data.frame(Predicted = c("4:00", "3:35", "3:38"),
Measured = c("3:39", "3:40", "3:53"))
## We now convert to date-time and use difftime to compute difference in minutes
database$Error <- with(database, difftime(strptime(Predicted,format="%H:%M"),strptime(Measured,format="%H:%M"), units="mins"))
## Use as.numeric to change units to seconds
database$Mean_Error <- with(database, mean(as.numeric(Error,units="secs")))
database$RMSE <- with(database, sqrt(mean(as.numeric(Error,units="secs")^2)))
## Predicted Measured Error Mean_Error RMSE
##1 4:00 3:39 21 mins 20 910.6042
##2 3:35 3:40 -5 mins 20 910.6042
##3 3:38 3:53 -15 mins 20 910.6042
Upvotes: 1