Reputation: 11
I'm getting a strange error when converting a date format from year and week to yyyy/mm/dd. In my dataset I have a variable with the date in format "2020-W20" for a wide variety of weeks in 2020 and 2021. When I convert it, R is converting all dates only to 2020/05/15 and 2021/05/15. Any ideas this might be happening?
vaccine_data <- read.csv(
"https://opendata.ecdc.europa.eu/covid19/vaccine_tracker/csv/data.csv",
na.strings = "",
fileEncoding = "UTF-8-BOM"
)
vaccine_data$date_yyyymmdd <- unique(
strftime(
strptime(vaccine_data$YearWeekISO, format = "%Y-W%U"),
format = "%Y/%m/%d"
)
)
Upvotes: 1
Views: 101
Reputation: 226097
This is tricky. There are two problems:
?strptime
, especially under the description of the "%V" format, where it says
"Accepted but ignored on input" — i.e. it just doesn't work. Therefore, I used the stringi
package, which uses the independent ICU implementation of date conversions. (It does give a warning about "Formatters %U, %V, %x, %X, %u, %w, %r, %g, %G, %c might not be 100% compatible with ICU", so you should definitely check your results!)This seems to be a solution, although you should check the beginning-of-week dates yourself and possibly subtract 1 from them, if you're getting Mondays but want Sundays (I haven't checked). (I initially tried adding "-0" to the strings, which is good for R, but the stringi
function doesn't like it.)
library(stringi)
w <- vaccine_data$YearWeekISO
sfmt <- stri_datetime_fstr('%Y-W%V-%w')
d <- as.Date(stri_datetime_parse(
sprintf("%s-1",w), format = sfmt))
Previous attempt in R, which gives NA
values for "2020-W53" (because I was using "%W", the UK convention, rather than the non-working "%V" format)
s <- sprintf("%s-0",w)
d <- strptime(s, format="%Y-W%W-%w")
unique(w[which(is.na(d))])
## [1] "2020-W53"
You could replace NA values with your desired date.
Upvotes: 1