Yellow_truffle
Yellow_truffle

Reputation: 923

Extracting "Year" , "Month" and "Day" from Date column which is in continuous string format

Hi I have a dataframe in the form shown below:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7), Date = c("20200230", 
"20200422", "20100823", "20190801", "20130230", "20160230", "20150627"
)), class = "data.frame", row.names = c(NA, -7L))

  ID     Date
1  1 20200230
2  2 20200422
3  3 20100823
4  4 20190801
5  5 20130230
6  6 20160230
7  7 20150627

the date in the Date column is not in the standard format and it's shown in yyyymmdd form. How can I separate year, month and day from Date column and save them as separate new column in data frame, so the result look like this?

  ID     Date   Year  Month  Day
1  1 20200230   2020   02     30
2  2 20200422   2020   04     22
3  3 20100823  ....................
4  4 20190801  ....................
5  5 20130230  ....................
6  6 20160230  ....................
7  7 20150627  ....................

I tried using format(as.Date(x, format="%YYYY%mm/%dd"),"%YYYY") but it didn't work for me. I also tried follwing code:

Data$Year <- year(ymd(Data$Date))

The result is in this form:

  ID     Date Year
1  1 20200230   NA
2  2 20200422 2020
3  3 20100823 2010
4  4 20190801 2019
5  5 20130230   NA
6  6 20160230   NA
7  7 20150627 2015

As mentioned by @neilfws , the reason I get NA is that the date is not valid; however, I really don't care about the validity and I want to extract the year in anycase.

Upvotes: 0

Views: 103

Answers (2)

hello_friend
hello_friend

Reputation: 5788

Base R in one expression:

# If you want to keep the Date vector: 
cbind(df, 
  strcapture(pattern = "^(\\d{4})(\\d{2})(\\d{2})$",
    x = df$Date,
    proto = list(year = integer(), month = integer(), day = integer())))

# If you want to drop the Date vector: 
cbind(within(df, rm(Date)),
      strcapture(pattern = "^(\\d{4})(\\d{2})(\\d{2})$",
        x = df$Date,
        proto = list(year = integer(), month = integer(), day = integer())))

Upvotes: 1

neilfws
neilfws

Reputation: 33782

If you only want the year and are not concerned with date validation, the easiest solution is probably to extract the first 4 characters from Date and convert to numeric.

Data$Year <- as.numeric(substring(Data$Date, 1, 4))

Might be good to have some kind of check for Date, e.g. that they all contain 8 digits.

Upvotes: 3

Related Questions