Parseltongue
Parseltongue

Reputation: 11657

Select column names by regex pattern

I want to select all columns that start in one of the four following ways: CB, LB, LW, CW but not any columns that have the string "con."

My current approach is:

tester <- df_ans[,names(df_ans) %in% colnames(df_ans)[grepl("^(LW|LB|CW|CB)[A-Z_0-9]*",colnames(df_ans))]]
tester <- tester[,names(tester) %in% colnames(tester)[!grepl("con",colnames(tester))]]

Is there a better / more efficient way to do this in a library like dplyr?

Upvotes: 5

Views: 6533

Answers (2)

akrun
akrun

Reputation: 886938

We can use matches

library(dplyr)
df %>%
   select(matches("^(CB|LB|LW|CW)"), -matches("con"))
#   CB1 LB2 CW3 LW20
#1   3   9   6    1
#2   3   3   4    5
#3   7   7   7    7
#4   5   8   7    2
#5   6   3   3    3

data

set.seed(24)
df <- as.data.frame(matrix(sample(1:9, 10 * 5, replace = TRUE),
       ncol = 10, dimnames = list(NULL, c("CB1", "LB2", "CW3", "WC1",
     "LW20", "conifer", "hercon", "other", "other2", "other3"))))

Upvotes: 9

G. Grothendieck
G. Grothendieck

Reputation: 269421

Try this:

nms <- names(df_ans)
df_ans[ grepl("^(LW|LB|CW|CB)", nms) & !grepl("con", nms) ]

Upvotes: 5

Related Questions