der_radler
der_radler

Reputation: 579

grep multiple characters in r

I have a list of files as below.

files <- c("MD_KFL_ 201707_ 201906_gelabelt.csv", "MD_KFL_ 201707_ 201906_gelabelt.sav","MD_KFL_201707_201907_gelabelt_V78.csv", "MD_KFL_201707_201907_gelabelt_V78.sav")

I need to grep the file names using only the following three unique names, i.e 201907, gelabelt, csv so that in this case I have the output as MD_KFL_201707_201907_gelabelt_V78.csv.

Note that the order of the two unique names 201907 and gelabelt can differ sometimes.

I tried this so far.

grep(paste(c('201907', 'gelabelt', 'csv'), collapse = '|'), files, value = T, fixed = F)

I can use

grep('201907_gelabelt_V78.csv', files, value = True)

but the order of the elements in the source keeps changing month on month.

How can I achieve this in r without having to input the exact string format every time.

Thanks for your inputs.

Upvotes: 2

Views: 89

Answers (2)

GKi
GKi

Reputation: 39717

You can use a combination of sapply and apply where tt holds the names which should be in files:

tt  <- c("201907", "gelabelt", "\\.csv$")
files[apply(sapply(tt, grepl, files), 1, all)]
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"

or you use a non-consuming regular expression

files[grep("(?=.*201907)(?=.*gelabelt).*\\.csv$", files, perl=TRUE)]
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 389275

I guess you could use OR pattern here to take into account either of one word can occur first

grep("(201907.*gelabelt|gelabelt.*201907).*csv", files, value = TRUE) 
#[1] "MD_KFL_201707_201907_gelabelt_V78.csv"

so that it will also match when "gelabelt" occurs first.

grep("(201907.*gelabelt|gelabelt.*201907).*csv", "gelabelt_MD_KFL_201907.csv", value = TRUE)
#[1] "gelabelt_MD_KFL_201907.csv"

Upvotes: 1

Related Questions