DeltaIV
DeltaIV

Reputation: 5656

computing and formatting averages and squares of time intervals

I have a model which predicts the duration of certain events, and measures of durations for those events. I then want to compute the difference between Predicted and Measured, the mean difference and the RMSE. I'm able to do it, but the formatting is really awkward and not what I expected:

database <- data.frame(Predicted = c(strptime(c("4:00", "3:35", "3:38"), format = "%H:%M")),
                       Measured = c(strptime(c("3:39", "3:40", "3:53"), format = "%H:%M")))
database
>             Predicted            Measured
1 2016-11-28 04:00:00 2016-11-28 03:39:00
2 2016-11-28 03:35:00 2016-11-28 03:40:00
3 2016-11-28 03:38:00 2016-11-28 03:53:00

This is the first weirdness: why does R shows me a time and a date, even if I clearly specified a time-only format (%H:%M), and there was no date in my data to start with? It gets weirder:

database$Error <- with(database, Predicted-Measured)
database$Mean_Error <- with(database, mean(Predicted-Measured))
database$RMSE <- with(database, sqrt(mean(as.numeric(Predicted-Measured)^2)))
> database
            Predicted            Measured    Error Mean_Error     RMSE
1 2016-11-28 04:00:00 2016-11-28 03:39:00  21 mins  0.3333333 15.17674
2 2016-11-28 03:35:00 2016-11-28 03:40:00  -5 mins  0.3333333 15.17674
3 2016-11-28 03:38:00 2016-11-28 03:53:00 -15 mins  0.3333333 15.17674

Why is the variable Error expressed in minutes? For Error it's not a bad choice, but it becomes quite hard to read for Mean_Error. For RMSE it's even worse, but this could be due to the as.numeric function: if I remove it, R complains that '^' not defined for "difftime" objects. My questions are:

  1. Is it possible to show the first 2 columns (Predicted and Measured) shown in the %H:%M format?
  2. for the other 3 columns ( Error, Mean_Error and RMSE) I would like to compare a %M:%S format and a format in only seconds, and choose among the two. Is it possible?

EDIT: just to be more clear, my goal is to insert observations of time intervals into a dataframe and compute a vector of time interval differences. Then, compute some statistics for that vector: mean, RMSE, etc.. I know I could just enter the time observations in seconds, but that doesn't look very good: it's difficult to tell that 13200 seconds are 3 hours and 40 minutes. Thus I would like to be able to store the time intervals in the %H:%M, but then be able to manipulate them algebraically and show the results in a format of my choosing. Is that possible?

Upvotes: 0

Views: 36

Answers (1)

aichao
aichao

Reputation: 7455

We can use difftime to specify the units for the difference in time. The output of difftime is an object of class difftime. When this difftime object is coerced to numeric using as.numeric, we can change these units (see the examples in ?difftime):

## Note we don't convert to date-time because we just want %H:%M
database <- data.frame(Predicted = c("4:00", "3:35", "3:38"),
                       Measured = c("3:39", "3:40", "3:53"))
## We now convert to date-time and use difftime to compute difference in minutes
database$Error <- with(database, difftime(strptime(Predicted,format="%H:%M"),strptime(Measured,format="%H:%M"), units="mins"))
## Use as.numeric to change units to seconds
database$Mean_Error <- with(database, mean(as.numeric(Error,units="secs")))
database$RMSE <- with(database, sqrt(mean(as.numeric(Error,units="secs")^2)))
##  Predicted Measured    Error Mean_Error     RMSE
##1      4:00     3:39  21 mins         20 910.6042
##2      3:35     3:40  -5 mins         20 910.6042
##3      3:38     3:53 -15 mins         20 910.6042

Upvotes: 1

Related Questions