Starcalibre
Starcalibre

Reputation: 105

Partial string matching with grep and regular expressions

I have a vector of three character strings, and I'm trying to write a command that will find which members of the vector have a particular letter as the second character.

As an example, say I have this vector of 3-letter stings...

example = c("AWA","WOO","AZW","WWP")

I can use grepl and glob2rx to find strings with W as the first or last character.

> grepl(glob2rx("W*"),example)
[1] FALSE  TRUE FALSE  TRUE

> grepl(glob2rx("*W"),example)
[1] FALSE FALSE  TRUE FALSE

However, I don't get the right result when I trying using it with glob2rx(*W*)

> grepl(glob2rx("*W*"),example)
[1] TRUE TRUE TRUE TRUE

I am sure my understanding of regular expressions is lacking, however this seems like a pretty straightforward problem and I can't seem to find the solution. I'd really love some assistance!

For future reference, I'd also really like to know if I could extend this to the case where I have longer strings. Say I have strings that are 5 characters long, could I use grepl in such a way to return strings where W is the third character?

Upvotes: 5

Views: 4389

Answers (2)

IRTFM
IRTFM

Reputation: 263481

I would have thought that this was the regex way:

>  grepl("^.W",example)
[1]  TRUE FALSE FALSE  TRUE

If you wanted a particular position that is prespecified then:

>  grepl("^.{1}W",example)
[1]  TRUE FALSE FALSE  TRUE

This would allow programmatic calculation:

pos= 2
n=pos-1
grepl(paste0("^.{",n,"}W"),example)
[1]  TRUE FALSE FALSE  TRUE

Upvotes: 8

josliber
josliber

Reputation: 44340

If you have 3-character strings and need to check the second character, you could just test the appropriate substring instead of using regular expressions:

example = c("AWA","WOO","AZW","WWP")
substr(example, 2, 2) == "W"
# [1]  TRUE FALSE FALSE  TRUE

Upvotes: 4

Related Questions