Write a faster time/character change function?

Question

I wrote this function to change the time column in a data frame to give me seconds.milliseconds. However, when I run it through 50,000+ lines of the data frame it takes some time to convert the characters. Is there a better way to write this code so that it runs faster?

Here is the function...

time_converter.rFUNC <- function(x) {
  for (column_name in colnames(x)[c(1)]) {
    for (row in 1:nrow(x)) {
      left_char <- as.data.frame(substr(x[row, column_name], nchar(x[row, column_name]) - (7), nchar(x[row, column_name]) - (6)))
      mid_char <- as.data.frame(substr(x[row, column_name], nchar(x[row, column_name]) - (4), nchar(x[row, column_name]) - (3)))
      right_char <- as.data.frame(substr(x[row, column_name], nchar(x[row, column_name]) - (2-1), nchar(x[row, column_name])))
      int.time <- cbind(left_char, mid_char, right_char)
      colnames(int.time) = c("min", "sec", "ms")
      int.time$min <- as.numeric(as.character(int.time$min))
      int.time$sec <- as.numeric(as.character(int.time$sec))
      int.time$ms <- as.numeric(as.character(int.time$ms))
      int.time$min <- ifelse(int.time$min > 0, int.time$min*60,0)
      int.time$ms <- int.time$ms/100
      x[row, column_name] <- as.data.frame(rowSums(int.time))
    }
  }
  return(x)
}

Here is a sample header of my data...

          time    x     y    z
1 000:00:00.05 8.50 10.00 6.50
2 000:00:00.10 4.00 10.00 6.50
3 000:00:00.15 8.50 10.00 6.50
4 000:00:00.20 3.50 10.00 6.50
5 000:00:00.25 3.50 10.00 6.50
6 000:00:00.30 3.00 10.00 6.50

***Note: The output of time column in my raw data set always contains the same number of characters. That's why I chose to use "nchar".

This would be the desired output of the sample given...

 time    x     y    z
1 0.05 8.50 10.00 6.50
2  0.1 4.00 10.00 6.50
3 0.15 8.50 10.00 6.50
4  0.2 3.50 10.00 6.50
5 0.25 3.50 10.00 6.50
6  0.3 3.00 10.00 6.50

***Edit Note: The raw output data is in the format Hours:Minutes:Seconds.Milliseconds.

I would want it to convert the time into just second.milliseconds

r2evans · Accepted Answer

If your time variable never goes above 23 hours, then I think @rawr's comment (augmented to include hours) will suffice:

dd <- read.table(header = TRUE, text = "
          time    x     y    z
1 000:00:00.05 8.50 10.00 6.50
2 000:00:00.10 4.00 10.00 6.50
3 000:00:00.15 8.50 10.00 6.50
4 000:00:00.20 3.50 10.00 6.50
5 000:00:00.25 3.50 10.00 6.50
6 000:00:00.30 3.00 10.00 6.50")

x <- strptime(dd$time, '0%H:%M:%OS'); x$sec + x$min * 60 + x$hour * 3600
# [1] 0.05 0.10 0.15 0.20 0.25 0.30

However, from the additional zeroes in the hours component, if you go beyond 23 then strptime will return NA. Here's a way around it:

dd <- read.table(header = TRUE, text = "
          time    x     y    z
1 000:00:00.05 8.50 10.00 6.50
2 001:00:00.10 4.00 10.00 6.50
3 010:00:00.15 8.50 10.00 6.50
4 020:00:00.20 3.50 10.00 6.50
5 030:00:00.25 3.50 10.00 6.50
6 100:00:00.30 3.00 10.00 6.50")
time2num <- function(x) {
  vapply(strsplit(x, ':'), function(y) sum(as.numeric(y) * c(60*60, 60, 1)),
         numeric(1), USE.NAMES=FALSE)
}

dd$seconds1 <- with(list(x = strptime(dd$time, '0%H:%M:%OS')), x$sec + x$min * 60 + x$hour * 3600)
dd$seconds2 <- time2num(dd$time)
dd
#           time   x  y   z seconds1  seconds2
# 1 000:00:00.05 8.5 10 6.5     0.05      0.05
# 2 001:00:00.10 4.0 10 6.5  3600.10   3600.10
# 3 010:00:00.15 8.5 10 6.5 36000.15  36000.15
# 4 020:00:00.20 3.5 10 6.5 72000.20  72000.20
# 5 030:00:00.25 3.5 10 6.5       NA 108000.25
# 6 100:00:00.30 3.0 10 6.5       NA 360000.30

While this generally performs faster (on both small and larger datasets, tested up to 6000 rows), it also does nothing with regards to verification; if there's anything amiss, it will warn (NAs introduced by coercion) and return NA.

Write a faster time/character change function?

Answers (1)

Related Questions