hectorgut47
hectorgut47

Reputation: 11

Why is R converting date formats erroneously?

I'm getting a strange error when converting a date format from year and week to yyyy/mm/dd. In my dataset I have a variable with the date in format "2020-W20" for a wide variety of weeks in 2020 and 2021. When I convert it, R is converting all dates only to 2020/05/15 and 2021/05/15. Any ideas this might be happening?

vaccine_data <- read.csv(
  "https://opendata.ecdc.europa.eu/covid19/vaccine_tracker/csv/data.csv",
  na.strings = "",
  fileEncoding = "UTF-8-BOM"
)

vaccine_data$date_yyyymmdd <- unique(
  strftime(
    strptime(vaccine_data$YearWeekISO, format = "%Y-W%U"), 
    format = "%Y/%m/%d"
  )
)

Upvotes: 1

Views: 101

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226097

This is tricky. There are two problems:

  • date/date-time conversion routines don't have a concept of "year/week only" dates. So you need to append a specified day-of-week (e.g. 0 or 1 for the first day of the week, depending on the convention) to get a date.
  • the system libraries for date conversion that R uses (it doesn't typically use its own, but wraps system libraries) don't generally have the ISO week convention encoded (see the details of ?strptime, especially under the description of the "%V" format, where it says "Accepted but ignored on input" — i.e. it just doesn't work. Therefore, I used the stringi package, which uses the independent ICU implementation of date conversions. (It does give a warning about "Formatters %U, %V, %x, %X, %u, %w, %r, %g, %G, %c might not be 100% compatible with ICU", so you should definitely check your results!)

This seems to be a solution, although you should check the beginning-of-week dates yourself and possibly subtract 1 from them, if you're getting Mondays but want Sundays (I haven't checked). (I initially tried adding "-0" to the strings, which is good for R, but the stringi function doesn't like it.)

library(stringi)
w <- vaccine_data$YearWeekISO
sfmt <- stri_datetime_fstr('%Y-W%V-%w')
d <- as.Date(stri_datetime_parse(
                    sprintf("%s-1",w), format = sfmt))

Previous attempt in R, which gives NA values for "2020-W53" (because I was using "%W", the UK convention, rather than the non-working "%V" format)

s <- sprintf("%s-0",w)
d <- strptime(s, format="%Y-W%W-%w")
unique(w[which(is.na(d))])
## [1] "2020-W53"

You could replace NA values with your desired date.

Upvotes: 1

Related Questions