Positive and negative subsetting using dplyr::contains() and dplyr::select() in R

Question

I'm trying to achieve positive subsetting specifically using a combination of dplyr::select() and dplyr::contains()`, with the goal being to subset by multiple string matches.

Minimal working example: when starting off with df1 and doing negative subsetting, I generate df2 as expected. In contrast, when attempting positive subsetting of df1, I generate df3 (no columns) when I'd have expected something like df4. Thanks for any help.

df1 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"), "hours"=c(4,6,4), "distance"=c(23,65,21))
df2 <- df1 %>% select(-contains("ppt_")) %>% select(-contains("het_")) %>% select(-contains("orm_"))
df3 <- df1 %>% select(contains("ppt_")) %>% select(contains("het_")) %>% select(contains("orm_")) 
df4 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"))

Vincent Bonhomme · Accepted Answer

Think (and have a look to the resulting data.frame) to what happens after: df1 %>% select(contains("ppt_")). As asked, it only retains the only column that contains "ppt_". Further expressions cannot work as you expect since other columns, no matter what you're feeding select with, are "no longer" there.

You can keep the same idea but combine in the same select you three keys:

df1 %>% select(matches("ppt_"), matches("het_"), matches("orm_"))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

Alternatively, you can do it with matches, that accepts regular expressions:

df1 %>% select(matches(c("ppt_|het_|orm_")))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

And by the way you can also use it to shorten your "negative" indexing:

df1 %>% select(-matches("ppt_|het_|orm_"))
  hours distance
1     4       23
2     6       65
3     4       21

Positive and negative subsetting using dplyr::contains() and dplyr::select() in R

Answers (1)

Related Questions