Thirst for Knowledge
Thirst for Knowledge

Reputation: 1628

Select only matching files using multiple patterns in list.files

I have the following csv files and I only want to select the ones which have matching 'pop' and 'throughput' values in each string:

example_pop_high_throughput_high_strategy.csv
example_pop_high_throughput_base_strategy.csv
example_pop_high_throughput_low_strategy.csv
example_pop_base_throughput_high_strategy.csv
example_pop_base_throughput_base_strategy.csv
example_pop_base_throughput_low_strategy.csv
example_pop_low_throughput_high_strategy.csv
example_pop_low_throughput_base_strategy.csv
example_pop_low_throughput_low_strategy.csv

I want only these:

example_pop_high_throughput_high_strategy.csv                
example_pop_base_throughput_base_strategy.csv
example_pop_low_throughput_low_strategy.csv

I can use list.files to select all files with, for example, 'high':

file_names <- list.files("made/up/path", pattern = c("high"))

Although, trying to do this twice to just match 'high' and 'high', didn't work:

file_names <- list.files("made/up/path", pattern = c("high", "high"))

Is there a way to select the files with matching 'pop' and 'throughput' values, preferably in a single expression?

Upvotes: 3

Views: 4771

Answers (2)

martin_joerg
martin_joerg

Reputation: 1163

The following should work:

file_names <- list.files("made/up/path", pattern = c("(low|base|high).+\\1"))

Upvotes: 6

Gurmanjot Singh
Gurmanjot Singh

Reputation: 10360

Try this regex:

^.*?pop_([^_]+)_throughput_\1.*$

Demo

Upvotes: 3

Related Questions