Reputation: 3
Sorry for the question, I started using RStudio a month ago and I get confronted to things I've never learned. I checked all the websites, helps and forums possible the past two days and this is getting me crazy.
I got a variable called Release
giving the date of the release of a song. Some dates are following the format %Y-%m-%d
whereas some others only give me a Year.
I'd like them to be all the same but I'm struggling to only modify the observations with the year.
Brief summary in word:
11/11/2011
01/06/2011
1974
1970
16/09/2003
I've imported the data with :
music<-read.csv("music2.csv", header=TRUE, sep = ",", encoding = "UTF-8",stringsAsFactors = F)
And this how I have it in RStudio
"2011-11-11" "2011-06-01" "1974" "1970" "2003-09-16"
This is an example as I got 2200 obs.
The working code is
Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release)
Modifdates
I obtain this :
"2011-11-11" "2011-06-01" "01-01-1974" "01-01-1970" "2003-09-16"
I just would like them to be all with the same format "%Y-%m-%d". How can I do that?
So I tried this
as.Date(music$Release,format="%Y-%m-%d")
But I got NA's where I modified my dates.
Could anyone help?
Upvotes: 0
Views: 332
Reputation: 4520
Update
Using sub
find occurrences of date consisting from single year ("(^[0-9]{4}$)"
part), using back-reference substitute it to add -01-01
at the end of the string ("\\1-01-01"
part), and finally convert it to the date
class, using as.Date()
(as.Date()
default is format = "%Y-%m-%d"
so you don't need to specify it):
dat <- c("2011-11-11", "2011-06-01", "1974", "1970", "2003-09-16")
dat
class is character
:
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
dat
class is factor
, but sub
automatically coerce it to the character
class for you:
# dat <- as.factor(dat); dat
# 2011-11-11 2011-06-01 1974 1970 2003-09-16
# Levels: 1970 1974 2003-09-16 2011-06-01 2011-11-11
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
Upvotes: 1
Reputation: 5958
Welcome to SO, please try to provide a reproducible example next time so that we can best help you. I think here you could use:
testdates <- c("1974", "12-12-2012")
betterdates <- ifelse(nchar(testdates)==4,paste0("01-01-",testdates),testdates)
> betterdates
[1] "01-01-1974" "12-12-2012"
EDIT: if your vector is factor you should use as.character.factor
first. If you then want to convert back to factor you can use as.factor
EDIT2 : do not convert as.date
before doing this. Only do it after this modification
Upvotes: 1