Reputation: 579
I have a list of files as below.
files <- c("MD_KFL_ 201707_ 201906_gelabelt.csv", "MD_KFL_ 201707_ 201906_gelabelt.sav","MD_KFL_201707_201907_gelabelt_V78.csv", "MD_KFL_201707_201907_gelabelt_V78.sav")
I need to grep
the file names using only the following three unique names, i.e 201907
, gelabelt
, csv
so that in this case I have the output as MD_KFL_201707_201907_gelabelt_V78.csv
.
Note that the order of the two unique names 201907
and gelabelt
can differ sometimes.
I tried this so far.
grep(paste(c('201907', 'gelabelt', 'csv'), collapse = '|'), files, value = T, fixed = F)
I can use
grep('201907_gelabelt_V78.csv', files, value = True)
but the order of the elements in the source keeps changing month on month.
How can I achieve this in r without having to input the exact string format every time.
Thanks for your inputs.
Upvotes: 2
Views: 89
Reputation: 39737
You can use a combination of sapply
and apply
where tt
holds the names which should be in files
:
tt <- c("201907", "gelabelt", "\\.csv$")
files[apply(sapply(tt, grepl, files), 1, all)]
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"
or you use a non-consuming regular expression
files[grep("(?=.*201907)(?=.*gelabelt).*\\.csv$", files, perl=TRUE)]
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"
Upvotes: 4
Reputation: 389325
I guess you could use OR pattern here to take into account either of one word can occur first
grep("(201907.*gelabelt|gelabelt.*201907).*csv", files, value = TRUE)
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"
so that it will also match when "gelabelt"
occurs first.
grep("(201907.*gelabelt|gelabelt.*201907).*csv", "gelabelt_MD_KFL_201907.csv", value = TRUE)
#[1] "gelabelt_MD_KFL_201907.csv"
Upvotes: 1