brienna
brienna

Reputation: 1604

Most efficient way to parse a body of text to match two regexs?

I'm new to Java and want to understand what I'm doing. Is there a more efficient way to program this? I have a body of text in connectionInfo that I broke into lines to parse for the matcher. Could I have retrieved the matching text from connectionInfo without breaking it into lines? Is this an efficient way to match two strings? Did I need to create two separate Matcher objects for this?

// Parse rid and rtoe (estimated time to completion)
String rid = null;
String rtoe = null;
Pattern ridPattern = Pattern.compile("RID = (.*$)");
Pattern rtoePattern = Pattern.compile("RTOE = (.*$)");
Matcher ridMatcher;
Matcher rtoeMatcher;

String[] lines = connectionInfo[0].split("\n");
for (String line : lines) {
    ridMatcher = ridPattern.matcher(line);
    rtoeMatcher = rtoePattern.matcher(line);
    if (ridMatcher.find()) {
        rid = ridMatcher.group(1);
    }
    if (rtoeMatcher.find()) {
        rtoe = rtoeMatcher.group(1);
    }
}

System.out.println("Request ID: " + rid);
System.out.println("Estimated time to completion: " + rtoe + " seconds");

Upvotes: 0

Views: 42

Answers (2)

John Bollinger
John Bollinger

Reputation: 180715

Could I have retrieved the matching text from connectionInfo without breaking it into lines?

You can match against input strings that contain newlines, so in that sense, yes. For the $ anchor to match before line terminators in addition to at the end of the input, you will want to compile your pattern with the MULTILINE option, or else put it directly into the pattern. The . metacharacter will not match newlines by default (though there's an option for that), so you don't have to worry about your capturing groups spanning multiple lines.

Is this an efficient way to match two strings?

It would be more efficient to skip the line splitting and iteration and just match against each pattern once. It would also be more convenient then to detect the error condition that one or both doesn't match anywhere:

Pattern ridPattern = Pattern.compile("RID = (.*)$", Pattern.MULTILINE);
Pattern rtoePattern = Pattern.compile("RTOE = (.*)$", Pattern.MULTILINE);
Matcher ridMatcher = ridPattern.matcher(connectionInfo[0]);
Matcher rtoeMatcher = rtoePattern.matcher(connectionInfo[0]);

if (ridMatcher.find()) {
    rid = ridMatcher.group(1);
}  // else error: no RID
if (rtoeMatcher.find()) {
    rtoe = rtoeMatcher.group(1);
}  // else error: no RTOE

System.out.println("Request ID: " + rid);
System.out.println("Estimated time to completion: " + rtoe + " seconds");

Did I need to create two separate Matcher objects for this?

A Matcher is specific to a particular Pattern, so with the Patterns as you have given them, yes, you need two Matchers. This is probably not something to worry about, but if the code presented runs so frequently that the performance or amount of garbage produced is a bona fide problem, then there are ways you could mitigate it:

  • Compile each Pattern once only for the whole class, recording it in a static field. Pattern compilation is comparatively expensive, so this could be a significant win if your code is a bottleneck.

  • If and only if the method is protected from concurrent invocation, create a single Matcher for each pattern, likewise recorded in static fields, and bind them at need to each new input via their one-arg reset() methods. This will reduce the amount of garbage produced, but it's only worth it if this method is making a significant contribution to a huge problem with GC.

Upvotes: 1

ShaneNal
ShaneNal

Reputation: 41

You're on your way to having a good solution, but there are some small things you can adjust to make it better.

matcher.group(1) will return null if it can't find the pattern, so we don't need to check and make sure that it will find it ( because you are starting at null anyway ). You're correct that we don't need two matchers as well.

// Parse rid and rtoe (estimated time to completion)
String rid = null;
String rtoe = null;
Pattern ridPattern = Pattern.compile("RID = (.*$)");
Pattern rtoePattern = Pattern.compile("RTOE = (.*$)");

String[] lines = connectionInfo[0].split("\n");
for (String line : lines) {
    Matcher patternMatcher = ridPattern.matcher(line);
    rid = patternMatcher.group(1);
    patternMatcher = rtoePattern.matcher(line);
    rtoe = patternMatcher.group(1);
}

System.out.println("Request ID: " + rid);
System.out.println("Estimated time to completion: " + rtoe + " seconds");

However, you may noticed some repeating code inside the for loop. Generally this is a bad idea. What we can do is make a little helper method to handle that logic:

private static String getGroupFromPattern( Pattern pattern, String line ) {
    Matcher patternMatcher = pattern.matcher(line);
    return patternMatcher.group(1);
}

Then your original logic would become:

// Parse rid and rtoe (estimated time to completion)
String rid = null;
String rtoe = null;
Pattern ridPattern = Pattern.compile("RID = (.*$)");
Pattern rtoePattern = Pattern.compile("RTOE = (.*$)");

String[] lines = connectionInfo[0].split("\n");
for (String line : lines) {
    rid = getGroupFromPattern(ridPattern, line);
    rtoe = getGroupFromPattern(rtoePattern, line);
}

System.out.println("Request ID: " + rid);
System.out.println("Estimated time to completion: " + rtoe + " seconds");

We could probably get away with only using a single Pattern object too, but I like the way you have it originally. Only because it is easier to read then a couple String references being feed a Pattern. Hope this helps!

One last note: Are you use you want to be using matcher.group(1)? This will give you the second grouping, and not the first. If you want the first use matcher.group(0), or matcher.group() for short hand.

Upvotes: 0

Related Questions