Reputation: 57
I'm trying to extract date pattern from txt file using R. I figured I could use str_extract and regular expression, but can't quite get the regex that would give me the intended result.
The pattern is likes of
January 21, 2016
March 3, 2019
April 15, 2013
and so on.
The current setup I have is that I loaded text file into R and put it as string variable called mystring, then
pattern <- "January|February|March|May|June|July|August|September|October|November|December\\s\\d{1,2},\\s\\d{4}"
str_extract_all(mystring,pattern)
I think I'm on the right track, but can't quite get it work. Eventually I also want to convert what I extracted and change character data to dates, in a default "2019-03-01" format, but I first need to figure out how to extract the data.
Upvotes: 0
Views: 204
Reputation: 866
Why not use lubridate
's mdy()
function, which will convert the string into a date which you can parse as you wish?
mdy("January 21, 2016")
[1] "2016-01-21"
Here's an example with multiple random dates:
random_dates <- format(sample(seq(as.Date('2018/01/01'), as.Date('2020/01/01'), by="day"), 12), "%B %d, %Y")
tidy_dates <- mdy(random_dates)
From:
[1] "June 05, 2018" "December 23, 2019" "October 20, 2019" "July 17, 2019" "February 26, 2019" "January 25, 2018"
[7] "August 16, 2018" "February 08, 2019" "July 31, 2019" "May 05, 2019" "November 30, 2018" "March 28, 2018"
To:
[1] "2018-06-05" "2019-12-23" "2019-10-20" "2019-07-17" "2019-02-26" "2018-01-25" "2018-08-16" "2019-02-08" "2019-07-31" "2019-05-05"
[11] "2018-11-30" "2018-03-28"
Upvotes: 1