gm007
gm007

Reputation: 587

Convert date piece of string into a date

I have a dataframe that has a column that is the file name of the imported workbook. The naming of the workbooks isn't standardized but they contain dates. I'd like to extract the dates from all the strings in the YYYY-MM-DD format so that I can create a date column.

This is an example vector of what the column looks like:

file_name <- c(
"Silly Elephant- September 30, 2011.xls",
"Silly Elephant- September 30, 2010.xls",
"NFL Las Vegas 10-31-16 Revised.xlsx",
"Silly Elephant July 31, 2011 (Revised).xls",
"Silly Elephant Report 1-31-2017.xlsx",
"Silly Elephant- September 30 2016.xlsx",
"Silly Elephant Report Feb 2020.xlsx",
"Silly Elephant Report 9-30-18.xlsx",                    
"Silly Elephant Report 9-30-2017.xlsx",                 
"Silly Elephant Report Apr 2019.xlsx",                  
"Silly Elephant Report Apr 2020.xlsx",                  
"Silly Elephant Report August 2019.xlsx",                
"Silly Elephant Report Dec 2018.xlsx",                  
"Silly Elephant Report December 2019.xlsx",              
"Silly Elephant Report FEB 2019.xlsx",                  
"Silly Elephant Report Feb 2020.xlsx",                   
"Silly Elephant Report Jan 2019.xlsx",                  
"Silly Elephant Report Jan 2020.xlsx",                   
"Silly Elephant Report July 2019.XLSX",                 
"Silly Elephant Report June 2019.xlsx",                  
"Silly Elephant Report Mar 2019.xlsx",                  
"Silly Elephant Report Mar 2020.xlsx",                   
"Silly Elephant Report May 2019.xlsx",                  
"Silly Elephant Report May 2020.xlsx",                   
"Silly Elephant Report November 2019.xlsx",
"Silly Elephant- June 30 2012.xlsx")

Upvotes: 0

Views: 38

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388807

Try lubridate's parse_date_time :

as.Date(lubridate::parse_date_time(file_name, c('%B%d%Y', '%m-%d-%y')))

# [1] "2011-09-30" "2010-09-30" "2016-10-31" "2011-07-31"
# [5] "2017-01-31" "2016-09-30" "2020-02-20" "2018-09-30"
# [9] "2017-09-30" "2019-04-20" "2020-04-20" "2019-08-20"
#[13] "2018-12-20" "2019-12-20" "2019-02-20" "2020-02-20"
#[17] "2019-01-20" "2020-01-20" "2019-07-20" "2019-06-20"
#[21] "2019-03-20" "2020-03-20" "2019-05-20" "2020-05-20"
#[25] "2019-11-20" "2012-06-30"

If your vector takes other format of dates you can add them as well.

Upvotes: 1

Related Questions