Reputation: 587
I have a dataframe that has a column that is the file name of the imported workbook. The naming of the workbooks isn't standardized but they contain dates. I'd like to extract the dates from all the strings in the YYYY-MM-DD
format so that I can create a date column.
This is an example vector of what the column looks like:
file_name <- c(
"Silly Elephant- September 30, 2011.xls",
"Silly Elephant- September 30, 2010.xls",
"NFL Las Vegas 10-31-16 Revised.xlsx",
"Silly Elephant July 31, 2011 (Revised).xls",
"Silly Elephant Report 1-31-2017.xlsx",
"Silly Elephant- September 30 2016.xlsx",
"Silly Elephant Report Feb 2020.xlsx",
"Silly Elephant Report 9-30-18.xlsx",
"Silly Elephant Report 9-30-2017.xlsx",
"Silly Elephant Report Apr 2019.xlsx",
"Silly Elephant Report Apr 2020.xlsx",
"Silly Elephant Report August 2019.xlsx",
"Silly Elephant Report Dec 2018.xlsx",
"Silly Elephant Report December 2019.xlsx",
"Silly Elephant Report FEB 2019.xlsx",
"Silly Elephant Report Feb 2020.xlsx",
"Silly Elephant Report Jan 2019.xlsx",
"Silly Elephant Report Jan 2020.xlsx",
"Silly Elephant Report July 2019.XLSX",
"Silly Elephant Report June 2019.xlsx",
"Silly Elephant Report Mar 2019.xlsx",
"Silly Elephant Report Mar 2020.xlsx",
"Silly Elephant Report May 2019.xlsx",
"Silly Elephant Report May 2020.xlsx",
"Silly Elephant Report November 2019.xlsx",
"Silly Elephant- June 30 2012.xlsx")
Upvotes: 0
Views: 38
Reputation: 388807
Try lubridate
's parse_date_time
:
as.Date(lubridate::parse_date_time(file_name, c('%B%d%Y', '%m-%d-%y')))
# [1] "2011-09-30" "2010-09-30" "2016-10-31" "2011-07-31"
# [5] "2017-01-31" "2016-09-30" "2020-02-20" "2018-09-30"
# [9] "2017-09-30" "2019-04-20" "2020-04-20" "2019-08-20"
#[13] "2018-12-20" "2019-12-20" "2019-02-20" "2020-02-20"
#[17] "2019-01-20" "2020-01-20" "2019-07-20" "2019-06-20"
#[21] "2019-03-20" "2020-03-20" "2019-05-20" "2020-05-20"
#[25] "2019-11-20" "2012-06-30"
If your vector takes other format of dates you can add them as well.
Upvotes: 1