grep and regular expressions - meta/wildcard characters

Question

This is a bare minimal example of a larger more complex dataset, just trying to get my head around something.

> grep("X10\.1+",c("X10.10","X10.11","X10.12"))
[1] 1 2 3

Now I would have expected only 2 to have been returned, since '+' is supposed to be '1 or more of the preceding element'. I thought escaping the period (which I have to deal with so want to keep it in the example) could have been causing the issue.

> grep("X101+",c("X1010","X1011","X1012"))
[1] 1 2 3

So, my understanding of the functionality of '+' is wrong?

CONCLUSION:

Thanks @James. So my understanding was the + was 'ANOTHER 1 or more of the preceding element' as opposed to what it actually means, which is 'JUST 1 or more of the preceding element'.

11+ would have done what I was thinking (having an ADDITIONAL 1 or more 1's after the first 1 etc). Cheers

James · Accepted Answer

You need to signify that after any number of 1s, you want to match the end of the string. You use $ to do this.

grep("X10\.1+$",c("X10.10","X10.11","X10.12"))
[1] 2

Similarly, ^ matches the start of the string if you want to restrict that the match starts X10., rather than PX10. for instance which would be matched by the existing regex.

grep and regular expressions - meta/wildcard characters

Answers (2)

Related Questions