Reputation: 320
I have a problem in R that is killing me! Can you help me?
I found a question in StackOverflow that gave me a very good explanation.
Here is the link: How to parse milliseconds?
I was able to implement the following code that works very well.
z2 <- strptime("10/2/20 11:16:17.682", "%d/%m/%y %H:%M:%OS")
z1 <- strptime("10/2/20 11:16:16.683", "%d/%m/%y %H:%M:%OS")
When I calculate z2-z1, I get Time difference of 0.9989998 secs
Similarly, when I use
z3 <- strptime("130 11:16:16.683", "%j %H:%M:%OS")
z4 <- strptime("130 11:16:18.682", "%j %H:%M:%OS")
When I calculate z4-z3, I get Time difference of 1.999 secs
What is my problem?
The first column has the format 130 18:25:50.408, with millions of rows!!!
The second column has the format 2020 130 18:25:51.357 that is like the first column but has the year 2020.
The first column is also from 2020, but as the year is not there R uses the current year.
First question,
How can I substract both columns? I know how to substract columns.
What I do not know is to subtract these two times.
For example, second time is 2020 130 18:25:51.357 and first time is 130 18:25:50.408
I guess that I can do it programmatically converting it to a string, and eliminating the 2020. However, I am hoping that a quicker solution is available using base R or the lubridate package.
Second question,
"%j %H:%M:%OS"
is the format for 130 11:16:16.683
What is the format for 2020 130 18:25:51.357?
As explained before this is working very well:
z3 <- strptime("130 11:16:16.683", "%j %H:%M:%OS")
But, this is NOT working.
z7 <- strptime("2020 130 11:16:16.683", "%y %j %H:%M:%OS")
I solved the second question!
However, I have not figured out yet the first question.
For the second question, the mistake in the format was that instead of %y, I need to write %Y with upper case.
Here is one example:
later <- strptime("2020 130 11:16:17.683", "%Y %j %H:%M:%OS")
earlier <- strptime("2020 130 11:16:16.684", "%Y %j %H:%M:%OS")
difftime(later,earlier,units="secs")
The R results is:
Time difference of 0.9990001 secs
At this point, what is pending is the following:
I need to substract two times that were made the same day on 2020.
The second time does have the year, the first time does not.
later <- strptime("2020 130 11:16:17.683", "%Y %j %H:%M:%OS")
earlier <- strptime("130 11:16:16.684", "%j %H:%M:%OS")
difftime(later,earlier,units="secs")
R produces the following result: Time difference of -31622399 secs
Why? As we are on 2021, R formats the vector earlier as the current year, 2021 because the year is not there.
My columns has millions of rows.
At this point, my guess is that I would need to add 2020 with a concatenation or something like that. Is there any other method?
Thank you for your help!
Upvotes: 2
Views: 776
Reputation: 2115
Your object z2 is a POSIX list object. What this means is that it is a list of the time elements of your time.
print.default(z2)
# $sec
# [1] 17.682
#
# $min
# [1] 16
#
# $hour
# [1] 11
#
# $mday
# [1] 10
#
# $mon
# [1] 1
#
# $year
# [1] 120
#
# $wday
# [1] 1
#
# $yday
# [1] 40
#
# $isdst
# [1] 0
#
# $zone
# [1] "GMT"
#
# $gmtoff
# [1] NA
#
# attr(,"class")
# [1] "POSIXlt" "POSIXt"
When you do a subtraction, z2 - z1
R dispatches this operation to a function called -.POSIXt
, which itself calls difftime
. This function converts z2 to a POSIX count object. What this means is that it gets converted to a count of seconds since the beginning of the epoch, by default "1970-01-01".
options("digits" = 16)
print.default(as.POSIXct(z2))
# [1] 1581333377.682
# attr(,"class")
# [1] "POSIXct" "POSIXt"
# attr(,"tzone")
# [1] ""
difftime(z2, z1)
# Time difference of 0.9989998340606689 secs
R, like most software, works with double precision numerics. This means that arithmetic is imprecise, although approximately true. Most software will try to hide this imprecision by reducing the number of digits shown. That said, different numbers will give you different imprecision, so you might prefer referring directly to the list element of z2.
print.default(z2$sec - z1$sec)
# [1] 0.9989999999999988
You could therefore apply the time difference using your favourite data.frame tools.
options("digits" = 6)
# character columns
df1 <- data.frame(
col1 = c("10/2/20 11:16:17.682", "10/2/20 11:16:16.683"),
col2 = c("130 11:16:16.683", "130 11:16:18.682"),
stringsAsFactors = FALSE)
library(dplyr)
# convert columns to POSIXlt
df2 <- mutate(df1,
col1 = strptime(col1, "%d/%m/%y %H:%M:%OS"),
col2 = strptime(stringr::str_c("2020 ", col2), "%Y %j %H:%M:%OS"),
diff_days = unclass(difftime(col2, col1, units = "days")))
df2
# col1 col2 diff_days
# 1 2020-02-10 11:16:17 2020-05-09 11:16:16 88.9583
# 2 2020-02-10 11:16:16 2020-05-09 11:16:18 88.9584
Upvotes: 1