Dr.Kameleon
Dr.Kameleon

Reputation: 22820

"OR" operator in RegEx syntax

OK, I've worked with RegEx numerous times but this is one of the things I honestly can't get my head around. And it looks as if I'm missing something rather simple...

So, let's say we want to match "AB" or "AC". In other words, "A" followed by either "B" OR "C". This would be expressed like A[BC] or A[B|C] or A(B|C) and so on.

Now, what if A,B,C are not just single letters but sub-expressions?


Please, have a look at this example here (well, I admit it doesn't look that... simple! lol) : http://regexr.com?382a4

I'm trying to match capital = (and its variations) followed by either :

Why is it that using the | operator only works on the latter part (my regex also matches "Pattern 2" withOUT preceding capital =). Please note that I've also tried using positive look-arounds, but without any success.

Any ideas?

Upvotes: 2

Views: 1044

Answers (3)

Mihai Stancu
Mihai Stancu

Reputation: 16117

Actually [B|C] is incorrect, (B|C) is correct.

Character classes

In RegEx jargon [] is called a character class and it is used to represent one (single) character according to the options listed between the brackets.

In your case [B|C] matches either B or | or C. We can correct this by using [BC] to match either B or C. This matches exactly one character either B or C.

Capturing groups

In RegEx jargon () is called a capturing group. It is used to create boundaries between adjacent groups and whatever it matches will be present in the output array of a preg_match or as a variable in preg_replace.

Within that group you can us the | operator to specify that you want to match either whatever's before or whatever's after the operator.

This can be used to match strings with more than one characters such as (Ana|Maria) or various structures such as ([a-zA-Z]+|[0-9]+).

You can also use the | outside of a capturing group such as (group-1)|(group-2) and you can also use subgrouping such as ((group-1)|(group-2)).

Upvotes: 1

Salman Arshad
Salman Arshad

Reputation: 272256

Your original regex could be summarized as:

capital = (ABC)|(DEF)

This matches capital = ABC or DEF. Add an extra pair of () that wraps the | clause properly.

Demo here

Upvotes: 2

MarcoS
MarcoS

Reputation: 17721

I suppose this regexp:

capital = (ABC|XYZ)

should work (if I did correctly understand your request...)

Upvotes: 1

Related Questions