Reputation: 9380
This is a bare minimal example of a larger more complex dataset, just trying to get my head around something.
> grep("X10\\.1+",c("X10.10","X10.11","X10.12"))
[1] 1 2 3
Now I would have expected only 2
to have been returned, since '+' is supposed to be '1 or more of the preceding element'. I thought escaping the period (which I have to deal with so want to keep it in the example) could have been causing the issue.
> grep("X101+",c("X1010","X1011","X1012"))
[1] 1 2 3
So, my understanding of the functionality of '+' is wrong?
CONCLUSION:
Thanks @James. So my understanding was the + was 'ANOTHER 1 or more of the preceding element' as opposed to what it actually means, which is 'JUST 1 or more of the preceding element'.
11+ would have done what I was thinking (having an ADDITIONAL 1 or more 1's after the first 1 etc). Cheers
Upvotes: 2
Views: 261
Reputation: 9500
In this case, the preceding element is the digit '1' which is present at that position in all 3 elements. Your prior understanding of '+' is correct.
Upvotes: 2
Reputation: 66834
You need to signify that after any number of 1s, you want to match the end of the string. You use $
to do this.
grep("X10\\.1+$",c("X10.10","X10.11","X10.12"))
[1] 2
Similarly, ^
matches the start of the string if you want to restrict that the match starts X10.
, rather than PX10.
for instance which would be matched by the existing regex.
Upvotes: 6