DockYard
DockYard

Reputation: 1040

Confusion regarding * operator in regular expression

I know * operator means any number occurances of the preceding character/s.

So an expression ab* would generate strings like "ab", "abab " etc. But it also generates the string "a" and I don't get this logic. Is it something that the * operator considers only 1 character preceding to it for the operation. So with this logic * operation is applied only on 'b' in the mentioned example, and b is repeated 0 times so the resulting string "a" is generated. Please help.

Edit : ab* would not generate strings like "abab " like I mentioned above. It generates only strings like ab, abb, abbb etc

Upvotes: 2

Views: 154

Answers (2)

Nir Levy
Nir Levy

Reputation: 4740

I know * operator means any number occurances of the preceding character/s.

The * operator means 0 or more occurrences of the preceding expression. In your case the expression before * is b (since in regexp each char is an expression). So ab* will match

a (0 "b" expressions)
ab (1 "b" expressions)
abbb (2 "b" expressions)
abab (1 "b" expressions followed by extra "ab", but note that `^ab*$` will not match `abab` since it is contained to the start/end of line.)

If you want to match ab zero or more times you have to handle ab as an expression, by using parenthesis like so (ab)*.

This part of Wikipedia explains it better than me.

Upvotes: 2

Socowi
Socowi

Reputation: 27205

So an expression ab* would generate strings like "ab", "abab " etc

That's not correct. ab* matches only a, ab, abb, abbb, abbbb, ...

Is it something that the * operator considers only 1 character preceding to it

Exactly.

If you want to apply * to ab, then you have to group it: (ab)*

Upvotes: 3

Related Questions