Santiago I. Hurtado
Santiago I. Hurtado

Reputation: 1123

Possible bug in as.POSIXct

I am working with time data, and I covert it to POSIXct class (read as strings). When I do this it work with all my data but no with one specific string. What I do is in essences:

Time1 <- '1900-04-01' # First Year then Month then Day
Time1_convert <- as.POSIXct( Time1, format='%Y-%m-%d')

I do this vectorized and all my data is well converted. But with the date 1920-05-01

Time1 <- '1920-05-01' 
Time1_convert <- as.POSIXct( Time1, format='%Y-%m-%d' )

This return NA. I have no idea why this happens. If I add to the as.POSIXct function tz = 'GMT'; the time is well convert for all values. What I do not understand is why this happen and why this happen with this specific value when I have tried with more than 1500 different times values.

I add an image of the output: Output in RStudio

More code added:

for( m in c(01,02,03,04,05,06,07,08,09,10,11,12)){ 
   print(as.POSIXct(paste0('1920-',m,'-01'),format='%Y-%m-%d'))
}

and the output is:

[1] "1920-01-01 CMT"
[1] "1920-02-01 CMT"
[1] "1920-03-01 CMT"
[1] "1920-04-01 CMT"
[1] NA
[1] "1920-06-01 -04"
[1] "1920-07-01 -04"
[1] "1920-08-01 -04"
[1] "1920-09-01 -04"
[1] "1920-10-01 -04"
[1] "1920-11-01 -04"
[1] "1920-12-01 -04"

Output of sessionInfo():

R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

locale:
 [1] LC_CTYPE=es_AR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_AR.UTF-8        LC_COLLATE=es_AR.UTF-8    
 [5] LC_MONETARY=es_AR.UTF-8    LC_MESSAGES=es_AR.UTF-8   
 [7] LC_PAPER=es_AR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

loaded via a namespace (and not attached):
[1] tools_3.3.3

Upvotes: 2

Views: 465

Answers (1)

Jon Spring
Jon Spring

Reputation: 66490

Your local settings appear to be based in Argentina. As it happens, Argentina reset their time zone on that date from UTC-4:16:48 to UTC-4. I think this means that there wasn't a midnight in Argentina on May 5, 1920. When you convert that string to POSIXct, it interprets it at midnight that day in your local time zone, which by coincidence is a time that did not exist in Argentina. (This explains why it was not reproducible for others who tried the same code.)

http://www.statoids.com/tar.html

Locations in Argentina observed Local Mean Time until 1894-10-31 00:00 (as measured after the transition). At that moment, the entire country synchronized on Córdoba's Local Mean Time, which was UTC-4:16:48. The next transition occurred at 1920-05-01 00:00, when clocks were set ahead sixteen minutes and forty-eight seconds to be an even UTC-4. Argentina remained unified on UTC-4 until its first daylight saving time was inaugurated in 1931.

If you need a POSIXct object, you might consider:

a) specifying a different time zone where midnight existed on that day.

as.POSIXct("1920-05-01", tz = "UTC") 
# Or perhaps other nearby time zones didn't have that specific problem?

b) Storing the time in components, including one for date, and one for time within the day. e.g. time = hour(Time1) + minute(Time1)/60. It's a little unwieldy but it might be possible to perform the date / time calcs you need.

Upvotes: 2

Related Questions