Reputation: 21
Quick question, if a is a regular expression then is it true that a* = (a+)* ?
Is (a+)* a valid expression? If it is, then can anyone explain why is it the same as a*? I apologize for asking here, but I couldn't find anything via Google.
Upvotes: 1
Views: 622
Reputation: 28292
It is true that L(a*) = L((a+)*)
. We can prove this by showing L(a*)
is a subset of L((a+)*)
and vice versa.
To show that L(a*)
is a subset of L((a+)*)
we must show that anything generated by a*
is also generated by (a+)*
. We need provide only one method of generation. The regular expression a*
generates strings e = a^0, a = a^1, aa = a^2, ..., a^k, ..., for all integers n. To generate any of these, it suffices to choose the generated substring a
from the subexpression a+
and replace, which yields the same expression a*
and obviously generates the same strings of a's in the same way.
To show that L((a+)*)
is a subset of L(a*)
, we need only point out that the only alphabet symbol in the expression (a+)*
is a
, and so that expression cannot generate anything but strings of a's. Because a*
generates all such strings, it is similarly clear that L((a+)*)
is a subset or L(a*)
.
Because L(a*)
and L((a+)*)
are subsets of each other, the sets must be equal. That is, te expressions generate the same language and are therefore equivalent.
Upvotes: 2
Reputation: 26129
No, (a+)*
matches the same string as a*
, but it is an anti-pattern because of ReDoS. On its own the (a+)*
is not harmful, but many regex engines can be frozen with the "aaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
string if you use the (a+)*x
pattern. Another difference that you have a capturing group in (a+)*
.
Upvotes: -1
Reputation: 5805
Yes, (a+)*
is valid and equivalent to a*
. The first expression means "a sequence of at least one a
, repeated 0 or more times", and the second means "one a
, repeated 0 or more times". Clearly both are equivalent.
Upvotes: 0