nzcoops
nzcoops

Reputation: 9380

grep and regular expressions - meta/wildcard characters

This is a bare minimal example of a larger more complex dataset, just trying to get my head around something.

> grep("X10\\.1+",c("X10.10","X10.11","X10.12"))
[1] 1 2 3

Now I would have expected only 2 to have been returned, since '+' is supposed to be '1 or more of the preceding element'. I thought escaping the period (which I have to deal with so want to keep it in the example) could have been causing the issue.

> grep("X101+",c("X1010","X1011","X1012"))
[1] 1 2 3

So, my understanding of the functionality of '+' is wrong?

CONCLUSION:

Thanks @James. So my understanding was the + was 'ANOTHER 1 or more of the preceding element' as opposed to what it actually means, which is 'JUST 1 or more of the preceding element'.

11+ would have done what I was thinking (having an ADDITIONAL 1 or more 1's after the first 1 etc). Cheers

Upvotes: 2

Views: 261

Answers (2)

DWright
DWright

Reputation: 9500

In this case, the preceding element is the digit '1' which is present at that position in all 3 elements. Your prior understanding of '+' is correct.

Upvotes: 2

James
James

Reputation: 66834

You need to signify that after any number of 1s, you want to match the end of the string. You use $ to do this.

grep("X10\\.1+$",c("X10.10","X10.11","X10.12"))
[1] 2

Similarly, ^ matches the start of the string if you want to restrict that the match starts X10., rather than PX10. for instance which would be matched by the existing regex.

Upvotes: 6

Related Questions