SiH
SiH

Reputation: 1546

Issues in writing and reading datetime in R

Below is an example, where I calculate travel time between to points. I create a tibble with start time, duration, end_time and time_diff = end_time - start_time. I wrote the tibble using write_csv() and read it again with read_csv()

library(sf)
library(tidygeocoder)
library(osrm)
library(lubridate)

# 1. One World Trade Center, NYC
# 2. Madison Square Park, NYC
adresses <- c("285 Fulton St, New York, NY 10007", 
              "11 Madison Ave, New York, NY 10010")

# geocode the two addresses & transform to {sf} data structure
data <- tidygeocoder::geo(adresses, method = "osm") %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326)

rownames(data) <- c("One World Trafe Center", "Madison Square Park")

# calculate travel time from "One World Trade Center" to "Madison Square Park"
osroute <- osrmTable(src = data["One World Trade Center", ],
                     dst = data["Madison Square Park", ])

tbl_out <- tibble(start_trip = ymd_hms("2018-01-01 08:00:00", tz = "America/New_York"), 
              duration = osroute$durations,
              end_trip = start_trip + 60 * osroute$durations,
              time_diff = difftime(end_trip, start_trip, units = "mins")

write.csv(tbl_out, 
          "sample.csv")

tbl_in <- read_csv("sample.csv")

Here is the screenshot of tbl_out (used to write the data)

enter image description here

Here is the screenshot of tbl_in (read using read_csv)

enter image description here

Can someone help me fix some issues -

  1. Datetime changed from EST to UTC after writing into csv and then reading from csv file
  2. Time_diff and duration have same values. But time_diff is better readable and how can I add unit mins to duration. I don't want to convert it character and paste min.
  3. Is there a way to convert time duration and time_diff to HH:MM and still perform basic operation like addition
  4. Follow up on 3. In case we convert time to HH:MM. Can I write 74.5 mins as 74:30.
  5. time_diff in tbl_in after reading from csv file does not have min

Upvotes: 0

Views: 575

Answers (1)

r2evans
r2evans

Reputation: 161007

  1. The time's timezone is not written to file when you write.csv, so when reading it in there is no clue what TZ it should be. I suggest you always write in one timezone (e.g., "UTC"), and then explicitly cast it when you read it in. You can do this with attr<-:

    now <- Sys.time()
    now
    # [1] "2021-08-27 14:09:59 EDT"
    attr(now, "tzone") <- "UTC"
    now
    # [1] "2021-08-27 18:09:59 UTC"
    
  2. What you're seeing is a "difftime"-class object. You can convert any numeric vector into this; all that that class does is change the print method used for it.

    times <- c(10.4, 19)
    times + 1
    # [1] 11.4 20.0
    times <- structure(times, class="difftime", units="mins")
    times
    # Time differences in mins
    # [1] 10.4 19.0
    times + 1
    # Time differences in mins
    # [1] 11.4 20.0
    
  3. One implementation could be to use data.table::as.ITime:

    times <- c(10.41, 19.01)
    as.ITime(60*times) # ITime assumes seconds for all numbers
    # [1] "00:10:24" "00:19:00"
    as.ITime(60*times) + 60
    # [1] "00:11:24" "00:20:00"
    

    Frankly, I think that may be the best way-forward, even if it gives you HH:MM:SS instead of just HH:MM (perhaps there's a way to customize that ... I don't know offhand). One benefit of this method is that it writes to CSVs as the HH:MM:SS format (which may or may not be perfect):

    write.csv(data.frame(x = as.ITime(60*times)), "foo.csv")
    readLines("foo.csv")
    # [1] "\"\",\"V1\""    "\"1\",00:10:24" "\"2\",00:19:00"
    

    However, if that doesn't fit your needs, then below is a hack that might be sufficient for your needs. (Note that this, when saved to a CSV, is not saved in that format, it is saved as a number.)

    format.my_difftime <- function(x, ..., digits = getOption("digits.my_difftime", 0)) {
      if (is.null(digits)) digits <- getOption("digits.my_difftime", 0) # idk why this is needed
      units <- attr(x, "units")
      if (!is.null(units)) {
        mult <- switch(units,
                       sec=, secs=1/60,
                       min=, mins=1,
                       hour=, hours=60,
                       day=, days=86400,
                       NA)
        if (is.na(mult)) {
          warning("Unrecognized units, ignoring: ", sQuote(units, q = FALSE))
          mult <- 1
        }
        x <- x * mult
      } # else assume 'mins'
      fmt <- paste0("%02i:%0", digits+2+(digits>0), ".0", digits, "f")
      sprintf(fmt, as.integer(x), 60 * (x %% 1))
    }
    print.my_difftime <- function(x, ...) cat(format(x), "\n")
    as.data.frame.my_difftime <- as.data.frame.difftime
    units.my_difftime <- function(x) attr(x, "units")
    `units<-.my_difftime` <- function(x, value) {
      attr(x, "units") <- value
      x
    }
    

    Demonstration:

    times <- c(10.41, 19.01)
    structure(times, class = "my_difftime", units = "sec")
    # 00:10 00:19 
    structure(times, class = "my_difftime", units = "min")
    # 10:25 19:01 
    structure(times, class = "my_difftime", units = "hour")
    # 624:36 1140:36 
    
    options(digits.my_difftime = 3)
    times <- structure(times, class = "my_difftime", units = "min")
    times
    # 10:24.600 19:00.600 
    data.frame(x = times)
    #           x
    # 1 10:24.600
    # 2 19:00.600
    options(digits.my_difftime = 0)
    data.frame(x = times)
    #       x
    # 1 10:25
    # 2 19:01
    dput(data.frame(x = times))
    # structure(list(x = structure(c(10.41, 19.01), class = "my_difftime", units = "min")), class = "data.frame", row.names = c(NA, -2L))
    
  4. (Resolved in 3.)

  5. Similar to #1, write.csv does not include units when it writes to a file; in fact, if it did, then read.csv (and most other CSV-reading functions) might presume that the column is character instead.

    write.csv(data.frame(x = times), "foo.csv") # using my_difftime, not ITime
    readLines("foo.csv")
    # [1] "\"\",\"x\""  "\"1\",10.41" "\"2\",19.01"
    

    The numbers are preserved.

    In the case of this naïve implementation for my_difftime, though, the numbers never change, so the "units" attribute is merely for presentation. That is, when you calculate times, make sure that its units are always "minutes" (in my assumption of the OP/question) or something known.

    From there, for TZ and for my_difftime, once somebody read.csv's the file, they are responsible for properly classing the column.

Upvotes: 1

Related Questions