Zlo
Zlo

Reputation: 1170

Parse string with additional characters in format to Date

I have a string variable that I want to parse to class Date. In addition to the day, year and month, the format has other characters like separators (, ), letters and apostrophes (u''), like this:

"u'9', u'2005', u'06'"

I have tried

as.Date(my_data$date, format = '%d %Y %m')

...but it only produces missing values. I was hoping that R would interpret the u'' as a unicode designator, which it doesn't.

How do I strip all those unused characters so that this "u'9', u'2005', u'06'" becomes simply this "9 2005 06"?

Upvotes: 1

Views: 504

Answers (3)

Henrik
Henrik

Reputation: 67778

You don't need to strip the characters not used in the conversion specification. In ?as.Date, the format argument is pointing to ?strptime ("Otherwise, the processing is via strptime"). In the Details section of ?strptime* we find that:

"[a]ny character in the format string not part of a conversion specification is interpreted literally"

That is, in the format argument of as.Date, you may include not only the conversion specification (introduced by %) but also the "other characters":

Furthermore, from ?as.Date:

Character strings are processed as far as necessary for the format specified: any trailing characters are ignored

Thus, this works:

as.Date("(u'9', u'2005', u'06')", format = "(u'%d', u'%Y', u'%m")
# [1] "2005-06-09"

Upvotes: 4

Dominic Comtois
Dominic Comtois

Reputation: 10401

Try this:

as.Date(gsub("[u',()]","",my_data$date), format = '%d %Y %m')

Example with a single string:

d <- "(u'9', u'2005', u'06')"
d <- gsub("[u',()]","",d)
d.date <- as.Date(d, "%d %Y %m")

Result:

d.date
[1] "2005-06-09"

Upvotes: 1

thepule
thepule

Reputation: 1751

If it is character class, you can try:

library(lubridate)

test <- c("u'9'", "u'2005'", "u'06'")

dym(paste(gsub("u|'", "", test), collapse = "/"))
[1] "2005-06-09 UTC"

Here I use lubridate to convert the string where I removed "u" and the ' character into time format. The collapse character I used in paste is arbitrary, lubridate can handle pretty much anything as a separator between date parts.

Upvotes: 0

Related Questions