Reputation: 12460
let's say I have reg exp of (a|b)*(ab)+
I know that *
means 0 or more and +
means 1 or more and |
means either or?
so, aab
, ab
, abab
, bbbbbbbbbbbbab
and aaaaaaaaaaaaaaab
would work.
Trying to understand these notation if it was (a|b)*|(ab)+
a alone would work, correct?
But a alone wouldnt work for (a|b)*(ab)+
Upvotes: 3
Views: 334
Reputation: 38456
You are correct across the board.
To draw it out more clearly, or at least as best as I can do without an actual image, we can look at the different parts in sections.
(a|b)
This will match a
or b
. Now, if you add a *
(named "kleene star") to this, it will match 0 or more times:
(a|b)*
As this is in the beginning of your regex, it's stating that you can have any repeating combination of a
and/or b
at the beginning of your input.
The second group:
(ab)
This requires that there must be an a
followed by a b
. Adding a +
makes it occur 1 or more times:
(ab)+
So, as this is at the end of your regex, it states that you may have one or more repeating ab
sequences at the end of our string.
Combined into (a|b)*(ab)+
, you can have any combination of a
and b
, as long as your input ends with at least one ab
.
If you were to add a |
between the two sets, the first set to match (with the one on the left of the |
evaluating first) would be the matched set.
With this, (a|b)*|(ab)+
can match just a
because (a|b)
can match just a
- and this group is on the left of the |
so it's evaluated first.
Upvotes: 2
Reputation: 4633
(a|b)*|(ab)+
(a|b)*
means 0 or more characters that can be a or b. In other words, any combination of 'a' and 'b'. (ab)+
means 1 or more 'ab'. Place the |
between them and it means one or the other. So yes, 'a' alone would work.
(a|b)*(ab)+
The first portion is the same, 0 or more characters that can be 'a' or 'b'. However there is no '|' which means concatenation. So after your sequence of 'a' and 'b', it is followed by 1 or more 'ab'.
I suggest you read this. It explains the three operations in formal regular expressions: kleene star, altercation, and concatenation. As well as how real-world regex engines work.
Upvotes: 1
Reputation: 116538
Let's translate.
(a|b)*(ab)+
means:
Therefore, any of ab
, bab
, aab
, abab
, aaababbab
would work; but a
, b
, aaa
, bbb
, bbba
, <empty>
would not.
(a|b)*|(ab)+
means:
Therefore, all of the above would match because they would all match the first alternative (a|b)*
. Some also just happen to match the second alternative (ab)+
but it actually doesn't matter - the second alternative is completely covered by the first and will never be checked!
Upvotes: 1
Reputation: 44289
You are right.
(a|b)*|(ab)+
can match a
only, because it will take the first alternative ((a|b)*
) and then match exactly one a
. In fact, this regex would even match an empty string.
(a|b)*(ab)+
on the other hand cannot choose anything, it may match 0 (a|b)
but then there has to be at least one ab
.
Note also, that the first regex is equivalent to (a|b)*
and [ab]*
, because this already covers the possibility of repeated (ab)
. (In fact there is a slight difference in terms of the capturing subgroups, but this is probably beyond what is relevant or applicable for you).
Upvotes: 4