Markus Germar
Markus Germar

Reputation: 223

R: transform irregular time strings

I have two different time series from different data frames with different irregular formats, but the problem is the same. I want to extract only hours, minutes, seconds and milliseconds.

The times series look like this:

ts1

08:27:23,445
08:27:24,280
08:27:25,115
...

I tried

strptime("08:27:23,445", "%H:%M:%OS")
[1] "2013-05-23 08:27:23"

I lost the millisecond-information and get the useless (for me) date information.

ts2

Fri Apr 19 2013 08:39:41 GMT+0200
Fri Apr 19 2013 08:39:43 GMT+0200
Fri Apr 19 2013 08:39:45 GMT+0200
...

I tried

strptime("Fri Apr 19 2013 08:39:41 GMT+0200", "%a %b %d %Y %H:%M:%S %Z")
[1] NA

In the end, I want to transform ts1 and ts2 each into a new time series that have the same format (with milliseconds), for example:

ts1

08:27:23,445

ts2

08:39:41,000

The same format is important for me, because I want to operate with the two time series later on. E.g.: Matching the time series, calculate differences, etc...

Thank you for your help!

UPDATE: add dput

Both datasets are very very long , thats why i tried to cut them down.

ts1

structure(list(t = structure(1:9, .Label = c("08:27:23,445", 
                                                   "08:27:24,280", "08:27:25,115", "08:27:25,960", "08:27:26,780", 
                                                   "08:27:27,540", "08:27:28,295", "08:27:29,075", "08:27:29,910"), class = "factor")), .Names = "t", row.names = c(NA, -9L
                                                   ), class = "data.frame")

ts2

structure(list(t = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 7L, 
                           8L), .Label = c("Fri Apr 19 2013 08:39:41 GMT+0200", "Fri Apr 19 2013 08:39:43 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:45 GMT+0200", "Fri Apr 19 2013 08:39:49 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:51 GMT+0200", "Fri Apr 19 2013 08:39:53 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:59 GMT+0200", "Fri Apr 19 2013 08:40:05 GMT+0200", 
                                           "Fri Apr 19 2013 08:40:06 GMT+0200"
                           ), class = "factor")), .Names = "t", row.names = c(NA, -9L), class = "data.frame")

Upvotes: 5

Views: 148

Answers (2)

datawookie
datawookie

Reputation: 6554

i think that the key to getting the %OS specifier to work might be converting from "," to "." as the decimal separator. the following works for me:

> ts1 = data.frame(t = c("08:27:23,445", "08:27:24,280", "08:27:25,115",
+                        "08:27:25,960", "08:27:26,780", "08:27:27,540",
+                        "08:27:28,295", "08:27:29,075", "08:27:29,910"),  
+                  stringsAsFactors = FALSE)
> 
> ts1$t = sub(",", ".", ts1$t)
> 
> T = apply(ts1, 1, function(n) {(strptime(n, format = "%H:%M:%OS"))})
> 
> T[[1]]                               # ****
                    t 
"2013-12-22 08:27:23" 
> sprintf("%.3f", T[[1]])
[1] "1387693643.445"

here you can see that if you convert the time structure to a numeric then the fractional seconds are definitely there!

however, it is also possible that your locale actually expects the comma as the decimal separator, in which case you may well find that the fractional seconds are actually there already but are not showing up when you simply look at the time structure (see line above marked ** where the fractional part is not displayed).

Upvotes: 0

mCorey
mCorey

Reputation: 295

Below is a quick lapply function that might help, IF you have a set-zero point. For example, if you only want to compare activities from 0:00 (midnight) until 23:59:99,999 on the same day. If so, you can convert the time into another form (minutes in my example) and you can see how long, say, a single activity takes.

Using your example for t1:

Make a vector of times (as characters)

time <- c("08:27:23,445",
          "08:27:24,280",
          "08:27:25,115")

Change the comma to a colon, for ease of stripping

time.new <- gsub(",", ":", time)

Compute decimal minutes

time.mins <- sapply(strsplit(as.character(time.new), ":"),
                    function(x) {
                      x<-as.numeric(x)
                      (x[1]*60+x[2]+(x[3]/60)+(x[4]/60000))
                    })

The results looks like this, if you make a df of the columns:

> df <- cbind(time, time.mins)
> df
     time           time.mins         
[1,] "08:27:23,445" "507.39075"       
[2,] "08:27:24,280" "507.404666666667"
[3,] "08:27:25,115" "507.418583333333"

I imagine this might be a bit more helpful for something like click-through rates, or when you don't ever care about a total gap of more than 24 hours.

Upvotes: 1

Related Questions