Reputation: 1823
I have a problem for extract dates in files names, in my example a have the file.name
object:
file.name<- c("AZAMBUJAI002A20190518T133231_20190518T133919_T22JCM_2021_05_19_01_18_22.tif","RINCAODOSSOARES051B20210107T133231_20190518T133919_T22JSM_2021_05_19_01_18_22",
"VILAPALMA33K20181018T133231_20190518T133919_T23JCM_2020_05_19_01_18_22.tif")
I need to extract in a new object the specific dates: 20190518
, 20210107
and 20181018
inside in the files names. But for this a can't use substr
because a have different lengths of areas names (AZAMBUJAI002A
,RINCAODOSSOARES051B
and VILAPALMA33K
) and not to use remove letters too (a cause of numeric area id - 002, 051 and 33). The dates in the end before ".tif" separated by "_" is not useful information.
My desirable output is:
mydates
[1] 2019-05-18
[2] 2021-01-07
[3] 2018-10-18
Is there any solution to the problem described? Thanks!!
Upvotes: 0
Views: 786
Reputation: 4497
Here is a way to extract using regex - assume you only have year start with 20xx
library(stringr)
library(lubridate)
date_string <- str_extract(file.name,
"20\\d{2}\\[0,1][1-9]\\[0-3][1-9]")
date_string
#> [1] "20190518" "20210107" "20181018"
ymd(date_string)
#> [1] "2019-05-18" "2021-01-07" "2018-10-18"
Created on 2021-05-19 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 6496
library(lubridate)
ymd(gsub("(^.*_)(20[0-9]{2}_)([0-9]{2}_)([0-9]{2}_)(.*$)",
"\\2\\3\\4",
file.name))
ymd
is a lubridate function that identifies YYYY-MM-DD dates, almost irrespective of the separator used.
gsub
converts a string. The regex inside:
The explanation to the code is still OK, but to retrieve the dates just after the names then the code needed is this:
ymd(gsub("(^.*[A-Z])(20[0-9]{2})([0-9]{2})([0-9]{2})(.*$)",
"\\2\\3\\4",
file.name))
Upvotes: 0
Reputation: 6483
Solution using base R functions. Works as long as the format is always "yyyymmdd" and the relevant string appears before the first underscore:
file.name<- c("AZAMBUJAI002A20190518T133231_20190518T133919_T22JCM_2021_05_19_01_18_22.tif",
"RINCAODOSSOARES051B20210107T133231_20190518T133919_T22JSM_2021_05_19_01_18_22",
"VILAPALMA33K20181018T133231_20190518T133919_T23JCM_2020_05_19_01_18_22.tif")
Using gsub
twice: First (in the inner function) to get rid of everything after the first underscore, and then to extract the sequence of eight numbers ([0-9]{8}
:
dates <- gsub(".*([0-9]{8}).*", "\\1", gsub("^([^_]*)_.*", "\\1", file.name))
Finally using as.Date
to convert the strings to a R date object (can be re-cast to a string using format
):
dates_as_actual_date <- as.Date(dates, format("%Y%m%d"))
dates_as_actual_date
is a R date object and looks like this:
[1] "2019-05-18" "2021-01-07" "2018-10-18"
Upvotes: 1