user2323534
user2323534

Reputation: 595

Have select in dplyr use more than one patterns to match

My goal is to select several columns from mydata that contain certain patterns.

mydata <- data.frame(q1 = rnorm(10), q10 = rnorm(10), q12 = rnorm(10), q20 = rnorm(10))

Method 1 - using grep - does what I need in a parsimonious way:

myvars <- names(mydata)[grep("^q10|^q12", names(mydata))]
temp <- mydata[myvars]
tbl_df(temp)

I am trying to do do it purely in dplyr. However, I am not finding anything more parsiminious (like in grep) than:

temp <- cbind(select(mydata, starts_with("q10")), select(mydata, starts_with("q12")))
tbl_df(temp)

It's too much code. How could I make it work with an "|"? I tried the following but none of them work:

select(mydata, starts_with("q10|q12"))
select(mydata, starts_with(c("q10","q12")))
temp <- select(mydata, starts_with("q10","q12"))
select(mydata, starts_with(c("q10"))|starts_with(c("q12")))

Advice? Thank you!

Upvotes: 2

Views: 1007

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

From the select() help file, I gather that the only special internal function that accepts a regular expression is matches(). You can use the regular expression ^q1(0|2) to start at the beginning of the name and match q1 with 0 or 2 following.

select(mydata, matches("^q1(0|2)"))
#            q10        q12
# 1  -0.97766671  1.2691732
# 2  -1.17397582 -0.8175758
# 3  -1.98684643  0.1117284
# 4   1.12142980  0.5737528
# 5   0.41680505  0.8974448
# 6   1.47558382 -1.5122752
# 7   0.39651297 -0.5282083
# 8  -0.13266148  0.8281671
# 9  -0.66982395  0.1239249
# 10  0.06119857 -0.3484675

Upvotes: 5

Related Questions