Markus
Markus

Reputation: 347

R: Create a file list without a specific string

I'm trying to create a list of files from a directory containing files with the following patterns:

Name_Surname_12345_noe_xy.xls  
Name_Surname_12345_xy.xls

xy can be one or two characters.

Now I want a list of all files wich do not contain "noe" in the filename. I can read in only "noe" - files using

fl = list.files(pattern = "noe.+xls$", recursive=T, full.names=T)

but found no way to exclude them. Any suggestions?

Many thanks
Markus

Upvotes: 1

Views: 1796

Answers (1)

Spacedman
Spacedman

Reputation: 94267

Get all the files and then use grep to find the noe ones and subset them out:

> all
[1] "Name_Surname_123425_xy.xls"    "Name_Surname_1234445_xy.xls"  
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"    
[5] "Name_Surname_13245_noe_xy.xls"
> all[grep("noe_xy.xls",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls"  "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"  

always make sure you check the edge cases where all or none of the files match:

> all[grep("xls",all,invert=TRUE)]
character(0)
> all[grep("fnord",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls"    "Name_Surname_1234445_xy.xls"  
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"    
[5] "Name_Surname_13245_noe_xy.xls"

Using grep with a negative index works except in these edge cases:

> all
[1] "Name_Surname_123425_xy.xls"    "Name_Surname_1234445_xy.xls"  
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"    
[5] "Name_Surname_13245_noe_xy.xls"
> all[-grep("noe_xy.xls",all)] # strip out the noe_xy.xls files

[1] "Name_Surname_123425_xy.xls"  "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"  

 # works. Now strip out any xls files (should leave nothing)

> all[-grep("xls",all)]
character(0)

# yup, that works too. Now strip out 'fnord' files, shouldn't remove anything:

> all[-grep("fnord",all)]
character(0)

Epic fail! Reason is left as an exercise to the reader.

Upvotes: 3

Related Questions