Hearen
Hearen

Reputation: 7838

Write the regex pattern for easier understanding/maintenance?

A regex pattern as this:

".*/.*/.*/.*/.*/.*/(.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*"

is really hard to maintain.

I am wondering, is there something as:

".*<userName>/.*<envName>/.*<serviceName>/.*<dataType>/.*<date>/.*<host>/(.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*<fileName>"

to help to read/understand the regex more easily?

Updated 2018-12-07

Thanks for the help of @Liinux, it's called free-spacing and a simple java demo would be:

public static void main(String[] args) {
    String re = "(?x)"
            + "# (?x) is the free-spacing flag\n"
            + "#anything here between the first and last will be ignored\n"
            + "#in free-spacing mode, whitespace between regular expression tokens is ignored\n"
            + "(19|20\\d\\d)       # year (group 1)\n"
            + "[-/\\.]             # separator\n"
            + "(\\d{2})            # month (group 2)\n"
            + "[-/\\.]             # separator\n"
            + "(\\d{2})            # day (group 3)";
    Pattern pattern = Pattern.compile(re);
    Stream.of("2018-12-07", "2018.12.07", "2018/12/07").forEach(aTest -> {
        System.out.println("**************** Testing: " + aTest);
        final Matcher matcher = pattern.matcher(aTest);
        if (matcher.find()) {
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group - " + i + ": " + matcher.group(i));
            }
        }
    });
}

Upvotes: 0

Views: 49

Answers (2)

Liinux
Liinux

Reputation: 173

You can add comments in regex using free-spacing, if your language supports it. In free-spacing mode whitespace is ignored (caveats apply) and you can add comments using the # sign.

Example from tutorial

# Match a 20th or 21st century date in yyyy-mm-dd format
(19|20)\d\d                # year (group 1)
[- /.]                     # separator
(0[1-9]|1[012])            # month (group 2)
[- /.]                     # separator
(0[1-9]|[12][0-9]|3[01])   # day (group 3)

Upvotes: 1

melpomene
melpomene

Reputation: 85827

If you're using Perl, you can just enable the /x flag and put whitespace and comments in your regex:

qr{
    .*  # userName
    /
    .*  # envName
    /
    .*  # serviceName
    /
    .*  # dataType
    /
    .*  # date
    /
    .*  # host
    /
    (.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*  # fileName
}x

That said, all of those .* should probably be [^/]* if that's what you mean (a sequence of non-slash characters).

You could also build up the pattern from variables with sensible names:

my $userName =
my $envName =
my $serviceName =
my $dataType =
my $date =
my $host = qr{[^/]*};

my $fileName = qr{(.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*};

...
qr{$userName/$envName/$serviceName/$dataType/$date/$host/$fileName}

Upvotes: 2

Related Questions