C_Mu
C_Mu

Reputation: 325

r find matching file for multiple condition

I have read a few posts here for file matching, but my questions is not exactly the same.

I am trying to find the file matching 3 conditions, and all 3 condition is a value of another 3 variable within loop, so looks like I can not directly put the variable in the pattern statement

Here is the example

c1 = "Curr Month"
c2 = "Entity Lst Yr"
c3 = "36008"
file_from = "my_path/"

f = list.files(path = paste0(file_from, "Account/"), pattern = glob2rx(c1*c2*c3))

and my error msg is in the pattern statement

non-numeric argument to binary operator

Any idea is highly appreciated, thank you so much!

Upvotes: 0

Views: 428

Answers (2)

r2evans
r2evans

Reputation: 160952

Uwe's comment might be simplest for you. If it can be in any order, then you need to be a little more creative.

Since I don't have your files or such, I'll create some samples:

# filelisting <- list.files(path=...) # no pattern
filelisting <- c(
  "Rob travel v1.2.docx",
  "the v1.2 version of travel for Rob.xlsx",
  "the v1.3 version of travel for Rob.xlsx",
  "the v1.2 version of travel for Carol.xlsx",
  "something else entirely.pptx",
  "C_Mu.R",
  "My travel v1.2.txt"
)
c1 <- "Rob"
c2 <- "travel"
c3 <- "v1.2"

If you need all three but allowing for different orders, then

grepl(paste(c1,c2,c3,sep=".*"), filelisting)
# [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

fails because it misses the second file.

Here's a thought:

sapply(c(c1,c2,c3), grepl, filelisting)
#        Rob travel  v1.2
# [1,]  TRUE   TRUE  TRUE
# [2,]  TRUE   TRUE  TRUE
# [3,]  TRUE   TRUE FALSE
# [4,] FALSE   TRUE  TRUE
# [5,] FALSE  FALSE FALSE
# [6,] FALSE  FALSE FALSE
# [7,] FALSE   TRUE  TRUE

From here, you can simply look for rows where all values are true, such as

apply(sapply(c(c1,c2,c3), grepl, filelisting), 1, all)
# [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

(using that to index on filelisting).

You can generalize this a little if you have many more than three conditions and/or the number of conditions can change.

allcs <- c("Rob", "travel", "v1.2", "docx")
apply(sapply(allcs, grepl, filelisting), 1, all)
# [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Within each string you can use real regex-type stuff (which means you need to escape regex language):

allcs <- c("Rob", "travel", "v1.2", "xlsx|docx")
apply(sapply(allcs, grepl, filelisting), 1, all)
# [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Upvotes: 0

MatAff
MatAff

Reputation: 1333

Are you trying to combine c1-3 into a regular expressions with wildcards in between? Does the below work?

reg <- glob2rx(paste(c1,c2,c3,sep="*"))
print(reg)

[1] "^Curr Month.*Entity Lst Yr.*36008$"

Upvotes: 2

Related Questions