Joshua Rosenberg
Joshua Rosenberg

Reputation: 4226

Match only exact matches to dplyr matches() helper function

I am using the matches() helper function as part of an argument to select() in the dplyr function.

The function looks like this for a hypothetical df data frame:

select(df, variable_1_name, matches("variable_2_name"))

At least as I'm currently using it, variable_2_name must be passed as a string to select().

However, if there is another variable in df that matches "variable_2_name", such as "variable_2_name_recode", then matches() will match both of those variables. Is it possible to match only exact matches with a dplyr function, or with a different approach?

Upvotes: 5

Views: 9396

Answers (3)

steveb
steveb

Reputation: 5532

You can of course just do the following when a string is not required:

select(df, variable_1_name, variable_2_name)

matches takes a pattern so you can try

# '^' anchors the match at the beginning of the string and
# '$' anchors the match at the end of the string.
select(df, variable_1_name, matches("^variable_2_name$"))

this should just match variable_2_name exactly.

If you have a function doing the select based on a string for the column name you could do the following (as mentioned by Psidom in a comment). The first example is simpler and the second is more of what you are looking for.

### Example 1
### Given function and the 'df' with the column 'variable_2_name'
my_func <- function(df, colname) { df %>% select_(colname) }
my_func(df, 'variable_2_name') # Call with column name string

### Example 2
### Using one column name that is not a string with a string column name string.
### 'df' has columns 'variable_1_name' and 'variable_2_name'
my_func <- function(df, colname) {
    df %>% select_(quote(variable_1_name), colname)
}
### Call with column name returns 2 columns of data
### 'variable_1_name' and 'variable_2_name'
my_func(df, 'variable_2_name')

Edit

dplyr::select_ is now deprecated, but the code above should be changeable to use dplyr::select instead of dplyr::select_.

Upvotes: 11

teichert
teichert

Reputation: 4713

You can use the one_of helper from dplyr:

select(df, variable_1_name, one_of("variable_2_name"))

As desired, this only selects columns that are an exact match, avoids potential problems if your column name includes regexp characters (e.g. .), and doesn't rely on the deprecated select_. It's also nice that it works even if the string-valued column name is stored in a variable:

# still works when the column name is stored in a variable `v`
v <- "variable_2_name"
select(df, variable_1_name, one_of(v))

Upvotes: 2

MrFlick
MrFlick

Reputation: 206242

If you have just one quoted variable and one string, I'd probably use the standard evaluation version and not bother with matches()

dd<-data.frame(a=1:3, b=1:3, aa=1:3)
dd %>% select_(quote(b), "a")

Since b isn't a string, we need to quote() it in this case. Or just use a string for b as well

myvar <- "a"
dd %>% select_("b", myvar)

Upvotes: 2

Related Questions