Andres Mora
Andres Mora

Reputation: 1106

How to extract dates from a text string?

Im trying to extract dates from a text string. So far I have had some progress with the anydate. My idea is to extract all dates within a text string,to a string separated by a comma, like this:

str1 = "08/07/2022 FC 08/15/2022 yusubclavio derecho"
paste0(anydate(str_extract_all(str1, "[[:alnum:]]+[ /]\\d{2}[ /]\\d{4}")[[1]]), collapse = ", ")
[1] "2022-08-07, 2022-08-15"

My problems start when date format is DD/MM/YYYY.

str1 = "22/08/2022 FC yusubclavio derecho"
paste0(anydate(str_extract_all(str1, "[[:alnum:]]+[ /]\\d{2}[ /]\\d{4}")[[1]]), collapse = ", ")
[1] ""

Upvotes: 2

Views: 479

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

A base R option might be using scan + grep

> grep("^(\\d|/)+$", scan(text = str1, what = "", quiet = TRUE), value = TRUE)
[1] "08/22/22"   "22/08/2022" "08/07/2022" "08/15/2022"

Upvotes: 0

akrun
akrun

Reputation: 887118

We could use parse_date from parsedate - it should be able to parse most of the date format, but 2 digit year can be an issue i.e if the '22' should be parsed as 1922 instead of 2022

library(parsedate)
as.Date( parse_date(unlist(str_extract_all(str1, "\\d+/\\d+/\\d+"))))

-output

[1] "2022-08-22" "2022-08-22" "2022-08-07" "2022-08-15"

data

str1 <- c("08/22/22 FC yusubclavio derecho", "22/08/2022 FC yusubclavio derecho", 
"08/07/2022 FC 08/15/2022 yusubclavio derecho")

Upvotes: 1

Related Questions