Reputation: 141
Hello I'm using RStudio 0.99.903 for Windows 64 bits. I am in the folder named "UCI HAR Dataset", if I list all the files in this folder and the subfolders using : list.files(recursive = TRUE)
, all files are listed as below:
full list of .txt files
However, I want to improve the code to list all .txt files except for "feature_info" and "README", that's what I used list.files(recursive = TRUE, pattern = "[^\\<_info\\> | ^\\<README\\>].txt")
, it worked by removing the two files I don't want, however, it also exclude those under "/train" folder. Can anyone help to clarify why it stops looking at the second subfolder?
Thanks!
Upvotes: 2
Views: 83
Reputation: 627488
The [^\\<_info\\> | ^\\<README\\>]
matches 1 char that is not equal to <
, _
, i
, n
, f
, o
, >
, space, |
, ^
, R
, E
, D
, M
, E
, as [^...]
is a negated bracket expression matching all chars other than those defined in the brackets. Then, then a .
matches any char and txt
matches a txt
as a literal char sequence.
Since you cannot use PCRE regex with list.files
, you may get all the files from the specified directory first, and then filter it out with grep
that supports PCRE regex with lookarounds that you need here:
> files <- list.files("C:\\5")
> files
[1] "info.txt" "README.txt" "some-text.txt"
> files <<- grep("(?<!^README|^info)\\.txt$", files, perl = TRUE, value = TRUE)
> files
[1] "some-text.txt"
Note that
(?<!^README|^info)
- is a negative lookbehind that fails the match if there is README
or info
at the start of the string, and if they are located immediately to the left of the current location (that is right before...)\\.
- a single dot (the pattern is \.
but we need to double backslashes in the string literals to denote a literal backslash)txt
- a literal char sequence$
- end of string.Upvotes: 1