Reputation: 489
I am using PHP.
I have a strings like:
example.123.somethingelse
example.1234.somethingelse
example.2015.123.somethingelse
example.2015.1234.somethingelse
and I came up with this regex
/example\.(2015\.|)([0-9]{3,4})\./
What I want to get is "123" or "1234" and it works for these strings. But when the string is
example.2015.A01.somethingelse
the result is "2015".
The way that I see it, after "2015." I have "A" and this should not be matched by the regex, but it is ( and I suppose there is a solid reason for it that I dont understand atm).
How can I fix it ( make the regex match nothing since the last string does not follow the same structure as the others) ?
Upvotes: 1
Views: 74
Reputation: 7948
here is one more pattern
example\.(?:2015\.)?\K(\d+)
or to your specific amount of digits
example\.(?:2015\.)?\K(\d{3,4})
Upvotes: 0
Reputation: 75242
Looks like you need a possessive quantifier:
/example\.(2015\.)?+([0-9]{3,4})\./
The 2015.
is still optional, but once the regex has matched it, it won't give it up, even if that causes the match to fail. I'm assuming the substring you're trying to capture with ([0-9]{3,4})
can never have the value 2015
. That is, you won't need to match something like this:
example.2015.somethingelse
If that's not the case, it's going to be much more complicated.
Upvotes: 0
Reputation: 3093
Your regex is this:
/example\.(2015\.|)([0-9]{3,4})\./
That says
When you have the string example.2015.A01.somethingelse
it matches the "example.2015."
but then, as you said, the "A"
messes it up so it backtracks and matches just "example."
(remember the "OR" allowed for nothing to be matched). So it matches "example."
followed by NOTHING followed by 3 or 4 numeric digits -- since "2015"
is 4 numeric digits it comfortably matches "example.2015"
.
It's hard to tell from your description, but I think you've just got a mis-placed vertical bar:
/example\.(2015\.)|([0-9]{3,4})\./
That should match EITHER "example.2015." OR numbers like 123 -- but "2015" is still 4 numeric digits in a row, so it will still match. I don't have a clear enough idea of the pattern to figure out how that could be avoided.
Upvotes: 1
Reputation: 5821
In your regex, you use the following:
(2015\.|)
This allows the regex to match either 2015.
or the empty string (zero characters).
When the regex example\.(2015\.|)([0-9]{3,4})\.
is applied to the following example:
example.2015.A01.somethingelse
it will to match the literal characters example
, and then the empty string with (2015\.|)
and then uses ([0-9]{3,4})\.
to match the string 2015
, which is 4 numerical characters. Thus your expression matches the following:
example.2015.
Upvotes: 0