grep multiple characters in r

Question

I have a list of files as below.

files <- c("MD_KFL_ 201707_ 201906_gelabelt.csv", "MD_KFL_ 201707_ 201906_gelabelt.sav","MD_KFL_201707_201907_gelabelt_V78.csv", "MD_KFL_201707_201907_gelabelt_V78.sav")

I need to grep the file names using only the following three unique names, i.e 201907, gelabelt, csv so that in this case I have the output as MD_KFL_201707_201907_gelabelt_V78.csv.

Note that the order of the two unique names 201907 and gelabelt can differ sometimes.

I tried this so far.

grep(paste(c('201907', 'gelabelt', 'csv'), collapse = '|'), files, value = T, fixed = F)

I can use

grep('201907_gelabelt_V78.csv', files, value = True)

but the order of the elements in the source keeps changing month on month.

How can I achieve this in r without having to input the exact string format every time.

Thanks for your inputs.

GKi · Accepted Answer

You can use a combination of sapply and apply where tt holds the names which should be in files:

tt  <- c("201907", "gelabelt", "\.csv$")
files[apply(sapply(tt, grepl, files), 1, all)]
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"

or you use a non-consuming regular expression

files[grep("(?=.*201907)(?=.*gelabelt).*\.csv$", files, perl=TRUE)]
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"

grep multiple characters in r

Answers (2)

Related Questions