Reputation: 405
This command works to subset the data filelist
to remove all "jpg" files.
filetype.isnotjpg <- setdiff(filelist, subset(filelist, grepl("\\.jpg$", filelist)))
So this takes the string "filelist" which contains names of files from a directory. I want to return all files that are not of type "jpg", "doc", "pdf", "xls", etc. I want to be able to specify as many types as I want to filter the list.
Ideally something like
target.files <- setdiff(filelist, subset(filelist, grepl(
c("\\.jpg$", "\\.doc$", "\\.pdf$", "\\xls$"), filelist)
This recursive algorithm works to do what I want:
a <- setdiff(files.list, subset(files.list, grepl("\\.tmp", files.list, ignore.case = TRUE)))
a <- setdiff(a, subset(a, grepl("\\.jpg", a, ignore.case = TRUE)))
a <- setdiff(a, subset(a, grepl("\\.pdf", a, ignore.case = TRUE)))
a <- setdiff(a, subset(a, grepl("\\.tif", a, ignore.case = TRUE)))
etc. Something like apply() might work? I'm new to R sorry.
The solution of 42 works:
target.files <- setdiff(
files.list,
subset(files.list,
grepl(
paste(
c("\\.jpg", "\\.doc", "\\.pdf",
"\\.xls", "\\.tif", "\\.docx", "\\.xlsx", "\\.jpeg"),
collapse="|") ,
files.list,
ignore.case = TRUE)))
Upvotes: 1
Views: 2092
Reputation: 66834
You can use file_ext
in tools
to extract the extension from a filename. Then you can just see if they are in your list and use standard vector subsetting:
filelist[!(tools::file_ext(filelist) %in% c("jpg","jpeg","doc","pdf","xls"))]
If you need to ignore case, you can wrap a tolower
around the list or extensions.
Upvotes: 1
Reputation: 263352
I would try paste()
-ing with a collapsing separator of "|" which is the OR operator for regex:
target.files <- setdiff(filelist, subset(filelist, grepl( paste(
c("\\.jpg$", "\\.doc$", "\\.pdf$", "\\xls$"), collapse="|") , filelist)
Did you know that the list.files
function also accepts a pattern argument so you could do this in a single step with something like:
my_files <- list.files(path="/path/to/dir/",
pattern=paste( c("\\.jpg$", "\\.doc$", "\\.pdf$", "\\xls$"),
collapse="|") )
Upvotes: 3