gotha
gotha

Reputation: 489

Regex OR matching stuff that I dont want

I am using PHP.

I have a strings like:

example.123.somethingelse
example.1234.somethingelse
example.2015.123.somethingelse
example.2015.1234.somethingelse

and I came up with this regex

/example\.(2015\.|)([0-9]{3,4})\./

What I want to get is "123" or "1234" and it works for these strings. But when the string is

example.2015.A01.somethingelse

the result is "2015".

The way that I see it, after "2015." I have "A" and this should not be matched by the regex, but it is ( and I suppose there is a solid reason for it that I dont understand atm).

How can I fix it ( make the regex match nothing since the last string does not follow the same structure as the others) ?

Upvotes: 1

Views: 74

Answers (5)

alpha bravo
alpha bravo

Reputation: 7948

here is one more pattern

example\.(?:2015\.)?\K(\d+)

Demo

or to your specific amount of digits

example\.(?:2015\.)?\K(\d{3,4})

Upvotes: 0

Alan Moore
Alan Moore

Reputation: 75242

Looks like you need a possessive quantifier:

/example\.(2015\.)?+([0-9]{3,4})\./

The 2015. is still optional, but once the regex has matched it, it won't give it up, even if that causes the match to fail. I'm assuming the substring you're trying to capture with ([0-9]{3,4}) can never have the value 2015. That is, you won't need to match something like this:

    example.2015.somethingelse

If that's not the case, it's going to be much more complicated.

Upvotes: 0

Peter Bowers
Peter Bowers

Reputation: 3093

Your regex is this:

/example\.(2015\.|)([0-9]{3,4})\./

That says

  • First match "example" followed by a period
  • Then match either "2015" followed by a period OR nothing at all.
  • Then match 3 or 4 digits in a row followed by a period

When you have the string example.2015.A01.somethingelse it matches the "example.2015." but then, as you said, the "A" messes it up so it backtracks and matches just "example." (remember the "OR" allowed for nothing to be matched). So it matches "example." followed by NOTHING followed by 3 or 4 numeric digits -- since "2015" is 4 numeric digits it comfortably matches "example.2015".

It's hard to tell from your description, but I think you've just got a mis-placed vertical bar:

/example\.(2015\.)|([0-9]{3,4})\./

That should match EITHER "example.2015." OR numbers like 123 -- but "2015" is still 4 numeric digits in a row, so it will still match. I don't have a clear enough idea of the pattern to figure out how that could be avoided.

Upvotes: 1

eigenchris
eigenchris

Reputation: 5821

In your regex, you use the following:

(2015\.|)

This allows the regex to match either 2015. or the empty string (zero characters).

When the regex example\.(2015\.|)([0-9]{3,4})\. is applied to the following example:

example.2015.A01.somethingelse

it will to match the literal characters example, and then the empty string with (2015\.|) and then uses ([0-9]{3,4})\. to match the string 2015, which is 4 numerical characters. Thus your expression matches the following:

example.2015.

Upvotes: 0

Roy Shmuli
Roy Shmuli

Reputation: 5019

Maybe use \d+ and take the first result in the array.

Upvotes: 0

Related Questions