ealeon
ealeon

Reputation: 12460

Regular Expression * |

let's say I have reg exp of (a|b)*(ab)+ I know that * means 0 or more and + means 1 or more and | means either or?

so, aab, ab, abab, bbbbbbbbbbbbab and aaaaaaaaaaaaaaab would work.

Trying to understand these notation if it was (a|b)*|(ab)+ a alone would work, correct?
But a alone wouldnt work for (a|b)*(ab)+

Upvotes: 3

Views: 334

Answers (4)

newfurniturey
newfurniturey

Reputation: 38456

You are correct across the board.

To draw it out more clearly, or at least as best as I can do without an actual image, we can look at the different parts in sections.

(a|b)

This will match a or b. Now, if you add a * (named "kleene star") to this, it will match 0 or more times:

(a|b)*

As this is in the beginning of your regex, it's stating that you can have any repeating combination of a and/or b at the beginning of your input.

The second group:

(ab)

This requires that there must be an a followed by a b. Adding a + makes it occur 1 or more times:

(ab)+

So, as this is at the end of your regex, it states that you may have one or more repeating ab sequences at the end of our string.

Combined into (a|b)*(ab)+, you can have any combination of a and b, as long as your input ends with at least one ab.

If you were to add a | between the two sets, the first set to match (with the one on the left of the | evaluating first) would be the matched set.

With this, (a|b)*|(ab)+ can match just a because (a|b) can match just a - and this group is on the left of the | so it's evaluated first.

Upvotes: 2

Austin Henley
Austin Henley

Reputation: 4633

(a|b)*|(ab)+

(a|b)* means 0 or more characters that can be a or b. In other words, any combination of 'a' and 'b'. (ab)+ means 1 or more 'ab'. Place the | between them and it means one or the other. So yes, 'a' alone would work.

(a|b)*(ab)+

The first portion is the same, 0 or more characters that can be 'a' or 'b'. However there is no '|' which means concatenation. So after your sequence of 'a' and 'b', it is followed by 1 or more 'ab'.

I suggest you read this. It explains the three operations in formal regular expressions: kleene star, altercation, and concatenation. As well as how real-world regex engines work.

Upvotes: 1

lc.
lc.

Reputation: 116538

Let's translate.

  1. (a|b)*(ab)+ means:

    • Zero or more of ("a" or "b")
    • Followed by one or more of "ab"

    Therefore, any of ab, bab, aab, abab, aaababbab would work; but a, b, aaa, bbb, bbba, <empty> would not.

  2. (a|b)*|(ab)+ means:

    • Zero or more of ("a" or "b")
    • Or by one or more of "ab"

    Therefore, all of the above would match because they would all match the first alternative (a|b)*. Some also just happen to match the second alternative (ab)+ but it actually doesn't matter - the second alternative is completely covered by the first and will never be checked!

Upvotes: 1

Martin Ender
Martin Ender

Reputation: 44289

You are right.

(a|b)*|(ab)+ can match a only, because it will take the first alternative ((a|b)*) and then match exactly one a. In fact, this regex would even match an empty string.

(a|b)*(ab)+ on the other hand cannot choose anything, it may match 0 (a|b) but then there has to be at least one ab.

Note also, that the first regex is equivalent to (a|b)* and [ab]*, because this already covers the possibility of repeated (ab). (In fact there is a slight difference in terms of the capturing subgroups, but this is probably beyond what is relevant or applicable for you).

Upvotes: 4

Related Questions