Marius Schulz
Marius Schulz

Reputation: 16470

Simplifying Regular Expression Quantifiers

I'm currently working on a library that simplifies creating regular expression patterns.

To generate the most legible patterns, I'd like to simplify quantifiers where possible. Assume the following subpattern has been emitted:

(?:\d+)?

The above can be simplified to \d*, but is it also correct to assume that (?:ℝ+)? can always be simplified to ℝ*, where is an arbitrary (parenthesized, if necessary) regular expression?

If yes, the same should hold true for the following, right?

Upvotes: 2

Views: 106

Answers (2)

John Kugelman
John Kugelman

Reputation: 362137

I'd suggest converting the quantifiers into (m, n) repetition counts:

  • * is (0, ∞),
  • + is (1, ∞), and
  • ? is (0, 1).

Then work out the rules for combining quantifiers. In general, what is (ℝ{m1, n1}){m2, n2}? Off the top of my head, I'd guess you could simplify that to ℝ{m1·m2, n1·n2} if neither m1 nor m2 is greater than 1.

Upvotes: 3

Bergi
Bergi

Reputation: 665517

Yes, you're correct. And you should always use the smaller one, as the nested repetitions are prone to catastrophic backtracking. So apart from different execution behavior, they will match the same languages.

Upvotes: 1

Related Questions