Barry Brown
Barry Brown

Reputation: 20604

Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

I am writing a tool to help students learn regular expressions. I will probably be writing it in Java.

The idea is this: the student types in a regular expression and the tool shows which parts of a text will get matched by the regex. Simple enough.

But I want to support several different regex "flavors" such as:

Java has the java.util.Regex class, but it supports only Perl-style regular expressions, which is a superset of the basic and extended REs. What I think I need is a way to take any given regular expression and escape the meta-characters that aren't part of a given flavor. Then I could give it to the Regex object and it would behave as if it was written for the selected RE interpreter.

For example, given the following regex:

^\w+[0-9]{5}-(\d{4})?$

As a basic regular expression, it would be interpreted as:

^\\w\+[0-9]\{5\}-\(\\d\{4\}\)\?$

As an extended regular expression, it would be:

^\\w+[0-9]{5}-(\\d{4})?$

And as a Perl-style regex, it would be the same as the original expression.

Is there a "regular expression for regular expressions" than I could run through a regex search-and-replace to quote the non-meta characters? What else could I do? Are there alternative Java classes I could use?

Upvotes: 1

Views: 1190

Answers (5)

dawg
dawg

Reputation: 103824

If your target is a Unix / Linux system, why just shell out to the definitive host of each regex? ie, use grep for BRE, egrep for ERE, perl for PCRE, etc? The only thing your module would need to do is the UI. Most of the regex testers that I have seen (that are decent) use a variant of this approach.

If you want yet another library suggestion, look at TRE for the BRE / ERE / POSIX / AWK part. It does not support back references, so PCRE / Python / Ruby / JS / Java is out...

Upvotes: 1

anjanb
anjanb

Reputation: 13867

if you want your students to learn regex,why not use a freely available tool -- regex Coach -- http://www.weitz.de/regex-coach/ on the net that is pretty good to learn and evaluate regexes ?

look at this SO thread on a similar issue -- https://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world

BR,
~A

Upvotes: 0

Markus Jarderot
Markus Jarderot

Reputation: 89171

I have written something similar: Is there a regular expression to detect a valid regular expression?

You could take part of that expression, and match each token separatly:

[^?+*{}()[\]\\]                # literal characters
\\[A-Za-z]                     # Character classes
\\\d+                          # Back references
\\\W                           # Escaped characters
\[\^?(?:\\.|[^\\])+?\]         # Character classs
\((?:\?[:=!>]|\?<[=!])?        # Beginning of a group
\)                             # End of a group
(?:[?+*]|\{\d+(?:,\d*)?\})\??  # Repetition
\|                             # Alternation

For each match, you could have some dictionary of appropriate replacements in the target flavor.

Upvotes: 1

Manu
Manu

Reputation: 29143

check out this post for a 'regular expression for regular expressions': Is there a regular expression to detect a valid regular expression?

You can use this as a basis for your module.

Upvotes: 1

toolkit
toolkit

Reputation: 50237

Alternatively, you could use Jakarta ORO?

This supports the following regex 'flavors':

  • Perl5 compatible regular expressions
  • AWK-like regular expressions
  • glob expressions

Upvotes: 1

Related Questions