SteveS
SteveS

Reputation: 4040

Regular Expression in str_detect?

I have colnames like c1c5, c5c1, c4c3 ...

And I want to retrieve all colnames that starts or ends with c4 and c5.

I have tried using the following:

str_detect(colnames(df), "c5c\\d+")

str_detect(colnames(df), "c4c\\d+")

str_detect(colnames(df), "c\\d+c4")

str_detect(colnames(df), "c\\d+c5")

Is there any way to combine it to one expression? Please advise.

Upvotes: 5

Views: 3269

Answers (3)

M L
M L

Reputation: 116

You can do in this way also. It works even if you have multiple words in a string.

str_detect(colnames(df), "(\bc[45])|(c[45]\b)")

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

You may use

str_detect(colnames(df), "^c[54]|c[54]$")

Or, with base R:

grep("^c[45]|c[45]$", colnames(df))

The regex is ^c[45]|c[45]$:

  • ^ - start of string
  • c - a c
  • [45] - 4 or 5
  • | - or
  • c[45] - c4 or c5 ...
  • $ - ... at the end of the string.

Upvotes: 4

Dave
Dave

Reputation: 359

Try with sapply:

colnames(df)[sapply(X = colnames(df), FUN = function (X) substr(X, 1, 2) %in% c("c4", "c5") | substr(X, 3, 4) %in% c("c4", "c5"))]

With this, you'll have the colnames started or ended by "c4" or "c5".

If you have colnames with length higher than 4 or you want to use somehting different than "c4" or "c5" you could generalize it with:

patterns <- c("c4", "c5") #you can change it

colnames(df)[sapply(X = colnames(df), FUN = function (X) substr(X, 1, 2) %in% patterns  | substr(X, nchar(X) - 1, nchar(X)) %in% patterns )]

You can even generalize it more, deppend on case.

Upvotes: 4

Related Questions