Andres
Andres

Reputation: 109

R match pattern with multiple choice numbers of digits

I struggled finding this simple task, just a pattern to match multiple choices of digits. For example if I had these filenames:

file_1_text.txt    ==> Yes
file_3_text.txt    ==> Yes
file_10_text.txt   ==> Yes
file_690_text.txt  ==> Yes
file_text_1.txt    ==> No
file_text_243.txt  ==> No
text_file_10.txt   ==> No
file_text_1.txt    ==> No

And I want to match only the filenames with the structure:

file_('one, two or three digits')_text.txt

I found the solution in stackoverflow but only in questions where this issue wasn't the main topic, but in suggestions in the comments that finally were not necessary for the main question and then it would not be easily visible for those looking for this specific matching pattern.

Upvotes: 0

Views: 49

Answers (2)

Andres
Andres

Reputation: 109

The best solution so far would be using the \\d{1,3} pattern. Here are some examples of using \\d

\\d      # matches any digit from 0 to 9, similar to [0-9].
\\d\\d   # matches two digits.
\\d{1}   # matches just one digit. The {} part is specifying the number of digits. 
\\d{1,}  # matches one of more digits. 
\\d{3,}  # matches three or more digits. 
\\d{,10} # matches any number of digits until a maximum of 10, and...
\\d{1,3} # matches from one to three digits.

This last case would be the solution to the question.

If we wanted to give a complete code to match all the files with the above mentioned structure:

selected_files <- list.files(
  path = "mypath",  #replace "mypath" with the desired path.
  pattern = "^file_\\d{1,3}_text.txt$")

Explanation: list.files() looks for filenames in the path you specify in the path argument. The ^ character is useful if you want to make sure the matching starts at the beginning of the file, not in the middle. And simile with the $ symbol, to make sure that the extension part: .txt is at the end of the filename.

Upvotes: -2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520878

You may match on the pattern ^file_\d{1,3}_text\.txt$ using grepl:

x <- c("file_1_text.txt", "file_3_text.txt", "file_10_text.txt", "file_690_text.txt", "file_text_1.txt", "file_text_243.txt", "text_file_10.txt", "file_text_1.txt")
output <- grepl("^file_\\d{1,3}_text\\.txt$", x)
output

[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

Upvotes: 0

Related Questions