lokheart
lokheart

Reputation: 24685

regular expression to filter out phrase based on the last few characters in R

I have a few phrases like below:

abc_xy_def
abc_xy
abc_vw_def
abc_vw
def_ab

I want to use regular expression to filter them into two groups: one group with abc_ head and the _def tail, and another group with the abc_ head only.

I have tried something like this:

> grepl("abc_[(a-z_)*][^def]","abc_xy_def")
[1] TRUE
> grepl("abc_[(a-z_)*][^def]","abc_xy")
[1] TRUE

But it doesn't work, can anyone help? Thanks.

Upvotes: 0

Views: 140

Answers (2)

malko
malko

Reputation: 2382

don't know R but should this work ?

grepl("^abc_.+_def$","abc_xy_def")

it seems that you mistaken the meaning of "[^def]" this will match one character that is not d , e or f so your regexp "abc_[(a-z_)*][^def]" will match any string containing abc_ followed by a single character that is one of (, a to z, _, ) or * followed by another single char that is not d, e or f

in the one i propose you here what it take

  • ^ -> mean we look at the start of the string
  • abc_ will force abc_
  • .* macth any character (not \n) 0 to unlimited times
  • def force def
  • $ mean that we must be at the end of the string

if you want thoose with no _def at end try this one: "abc_.+(?!def)"

Upvotes: 1

Osman Turan
Osman Turan

Reputation: 1371

For capturing all of them: ^abc_[a-z]*(_def|)$

For capturing only with _def tail: ^abc_[a-z]*_def$

For capturing only without _def tail: ^abc_[a-z]*$

If it's not accurate, please clarify your question.

Upvotes: 1

Related Questions