Nick Vence
Nick Vence

Reputation: 771

Importing dates with readr::read_csv()

I want to import a CSV file

today,color
01/02,blue
01/04,green
03/14,orange
07/04,red

using readr to create a column of date objects.

library(tidyverse)
library(lubridate)

read_csv("test.csv", col_types = "Dc") #first attempt
read_csv("test.csv", col_types = cols( #second attempt
         col_date(format = "%m/%d"),
         col_character()))

I figured that my first attempt didn't work because of the non-standard date format, so in my second attempt, I was explicit. Neither succeeded, and both returned the same warning.

Warning: 4 parsing failures.
row   col   expected actual       file
  1 today valid date  01/02 'test.csv'
  2 today valid date  01/04 'test.csv'
  3 today valid date  03/14 'test.csv'
  4 today valid date  07/04 'test.csv'
# A tibble: 4 x 2
  today      color
  <date>     <chr>
1 NA         blue
2 NA         green
3 NA         orange
4 NA         red

How should I structure this import?

Upvotes: 0

Views: 1850

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269461

The real problem here is that what we have is not a Date. A Date has a year and the input in the question has no year.

1) To overcome the above problem we can define a special class that can accept a month and day without year in the required format. We assume that the year should default to the current year. Use it with read.csv since it can work with arbitrary S4 classes.

Lines is defined in the Note at the end. Replace text=Lines with the filename to read from a file.

setClass("mmdd")
ch2mmdd <- function(from) as.Date(from, format = "%m/%d")
setAs("character", "mmdd", ch2mmdd)

read.csv(text = Lines, colClasses = c("mmdd", "character"))

giving:

       today  color
1 2021-01-02   blue
2 2021-01-04  green
3 2021-03-14 orange
4 2021-07-04    red

2) Alternately, use read_csv and convert it afterwards. This uses the ch2mmdd function from (1) but does not need the associated S4 class. On the other hand it does the conversion afterwards whereas it seems that the question wanted to do it as it was read in as in (1).

Lines %>%
  read_csv %>%
  mutate(today = ch2mmdd(today))

Note

Lines <- "today,color
01/02,blue
01/04,green
03/14,orange
07/04,red"

Upvotes: 1

akrun
akrun

Reputation: 886948

It is not a date format, thus col_date wouldn't work i.e. we need 'year' as well to have it. Instead, it is better to read it as character

df1 <- read_csv("test.csv", col_types = "cc") 

Then, add the year part as need, convert to Date class

library(lubridate)
df1$today <- dmy(paste0(df1$today, "/2021"))

Upvotes: 1

Related Questions