Reputation: 399
so I'm using 3rd party application that uses regex to get matches. It is automatically set to match only the first match since it only looking for one piece of information per page. I cannot change this setting unless I want it to find all matches to be display as an array which I rarely want it to do. That last condition doesn't apply to the match I want.
What I want it to find are ID codes. It just so happens that all the IDs start with 10 and are followed by 4 more numbers
Example:
104230
So I wrote this regex
10[0-9]{4}
The only problem with this is that there is a .js file in the header that is named 10022008.js and since it automatically chooses the first match, all the IDs get set to this.
How do you get regex to ignore that string of numbers and that string only? All the searches I have done only similar ignore type codes have not worked
Upvotes: 1
Views: 14246
Reputation: 7598
Lookahead is one solution. May not be the most efficient, but I think it is the most readable.
10\d{4}(?!08\.js)
This will match 10 followed by any four digits, provided that those digits are not followed by 08.js
.
Upvotes: 2
Reputation: 49
I'm not sure what the input data looks like, but could you limit it to the beginning and end of line?
^10[0-9]{4}$
Upvotes: -1
Reputation: 424983
Add the "word boundary" regex \b
to each end of your regex:
\b10[0-9]{4}\b
The word boundary matches between any "word" character (ie \w
, which is [0-9a-zA-Z_]
) and any non-word character, or visa versa, and is zero-width, so it won't add any characters to your capture.
Upvotes: 5