Reputation: 771
I want to import a CSV file
today,color
01/02,blue
01/04,green
03/14,orange
07/04,red
using readr to create a column of date objects.
library(tidyverse)
library(lubridate)
read_csv("test.csv", col_types = "Dc") #first attempt
read_csv("test.csv", col_types = cols( #second attempt
col_date(format = "%m/%d"),
col_character()))
I figured that my first attempt didn't work because of the non-standard date format, so in my second attempt, I was explicit. Neither succeeded, and both returned the same warning.
Warning: 4 parsing failures.
row col expected actual file
1 today valid date 01/02 'test.csv'
2 today valid date 01/04 'test.csv'
3 today valid date 03/14 'test.csv'
4 today valid date 07/04 'test.csv'
# A tibble: 4 x 2
today color
<date> <chr>
1 NA blue
2 NA green
3 NA orange
4 NA red
How should I structure this import?
Upvotes: 0
Views: 1850
Reputation: 269461
The real problem here is that what we have is not a Date. A Date has a year and the input in the question has no year.
1) To overcome the above problem we can define a special class that can accept a month and day without year in the required format. We assume that the year should default to the current year. Use it with read.csv
since it can work with arbitrary S4 classes.
Lines is defined in the Note at the end. Replace text=Lines with the filename to read from a file.
setClass("mmdd")
ch2mmdd <- function(from) as.Date(from, format = "%m/%d")
setAs("character", "mmdd", ch2mmdd)
read.csv(text = Lines, colClasses = c("mmdd", "character"))
giving:
today color
1 2021-01-02 blue
2 2021-01-04 green
3 2021-03-14 orange
4 2021-07-04 red
2) Alternately, use read_csv
and convert it afterwards. This uses the ch2mmdd
function from (1) but does not need the associated S4 class. On the other hand it does the conversion afterwards whereas it seems that the question wanted to do it as it was read in as in (1).
Lines %>%
read_csv %>%
mutate(today = ch2mmdd(today))
Lines <- "today,color
01/02,blue
01/04,green
03/14,orange
07/04,red"
Upvotes: 1
Reputation: 886948
It is not a date format, thus col_date
wouldn't work i.e. we need 'year' as well to have it. Instead, it is better to read it as character
df1 <- read_csv("test.csv", col_types = "cc")
Then, add the year
part as need, convert to Date
class
library(lubridate)
df1$today <- dmy(paste0(df1$today, "/2021"))
Upvotes: 1