Reputation: 141
I wrote this function to change the time column in a data frame to give me seconds.milliseconds. However, when I run it through 50,000+ lines of the data frame it takes some time to convert the characters. Is there a better way to write this code so that it runs faster?
Here is the function...
time_converter.rFUNC <- function(x) {
for (column_name in colnames(x)[c(1)]) {
for (row in 1:nrow(x)) {
left_char <- as.data.frame(substr(x[row, column_name], nchar(x[row, column_name]) - (7), nchar(x[row, column_name]) - (6)))
mid_char <- as.data.frame(substr(x[row, column_name], nchar(x[row, column_name]) - (4), nchar(x[row, column_name]) - (3)))
right_char <- as.data.frame(substr(x[row, column_name], nchar(x[row, column_name]) - (2-1), nchar(x[row, column_name])))
int.time <- cbind(left_char, mid_char, right_char)
colnames(int.time) = c("min", "sec", "ms")
int.time$min <- as.numeric(as.character(int.time$min))
int.time$sec <- as.numeric(as.character(int.time$sec))
int.time$ms <- as.numeric(as.character(int.time$ms))
int.time$min <- ifelse(int.time$min > 0, int.time$min*60,0)
int.time$ms <- int.time$ms/100
x[row, column_name] <- as.data.frame(rowSums(int.time))
}
}
return(x)
}
Here is a sample header of my data...
time x y z
1 000:00:00.05 8.50 10.00 6.50
2 000:00:00.10 4.00 10.00 6.50
3 000:00:00.15 8.50 10.00 6.50
4 000:00:00.20 3.50 10.00 6.50
5 000:00:00.25 3.50 10.00 6.50
6 000:00:00.30 3.00 10.00 6.50
***Note: The output of time column in my raw data set always contains the same number of characters. That's why I chose to use "nchar".
This would be the desired output of the sample given...
time x y z
1 0.05 8.50 10.00 6.50
2 0.1 4.00 10.00 6.50
3 0.15 8.50 10.00 6.50
4 0.2 3.50 10.00 6.50
5 0.25 3.50 10.00 6.50
6 0.3 3.00 10.00 6.50
***Edit Note: The raw output data is in the format Hours:Minutes:Seconds.Milliseconds.
I would want it to convert the time into just second.milliseconds
Upvotes: 0
Views: 64
Reputation: 160492
If your time
variable never goes above 23 hours, then I think @rawr's comment (augmented to include hours) will suffice:
dd <- read.table(header = TRUE, text = "
time x y z
1 000:00:00.05 8.50 10.00 6.50
2 000:00:00.10 4.00 10.00 6.50
3 000:00:00.15 8.50 10.00 6.50
4 000:00:00.20 3.50 10.00 6.50
5 000:00:00.25 3.50 10.00 6.50
6 000:00:00.30 3.00 10.00 6.50")
x <- strptime(dd$time, '0%H:%M:%OS'); x$sec + x$min * 60 + x$hour * 3600
# [1] 0.05 0.10 0.15 0.20 0.25 0.30
However, from the additional zeroes in the hours component, if you go beyond 23 then strptime
will return NA
. Here's a way around it:
dd <- read.table(header = TRUE, text = "
time x y z
1 000:00:00.05 8.50 10.00 6.50
2 001:00:00.10 4.00 10.00 6.50
3 010:00:00.15 8.50 10.00 6.50
4 020:00:00.20 3.50 10.00 6.50
5 030:00:00.25 3.50 10.00 6.50
6 100:00:00.30 3.00 10.00 6.50")
time2num <- function(x) {
vapply(strsplit(x, ':'), function(y) sum(as.numeric(y) * c(60*60, 60, 1)),
numeric(1), USE.NAMES=FALSE)
}
dd$seconds1 <- with(list(x = strptime(dd$time, '0%H:%M:%OS')), x$sec + x$min * 60 + x$hour * 3600)
dd$seconds2 <- time2num(dd$time)
dd
# time x y z seconds1 seconds2
# 1 000:00:00.05 8.5 10 6.5 0.05 0.05
# 2 001:00:00.10 4.0 10 6.5 3600.10 3600.10
# 3 010:00:00.15 8.5 10 6.5 36000.15 36000.15
# 4 020:00:00.20 3.5 10 6.5 72000.20 72000.20
# 5 030:00:00.25 3.5 10 6.5 NA 108000.25
# 6 100:00:00.30 3.0 10 6.5 NA 360000.30
While this generally performs faster (on both small and larger datasets, tested up to 6000 rows), it also does nothing with regards to verification; if there's anything amiss, it will warn (NAs introduced by coercion
) and return NA
.
Upvotes: 1