Georges Lteif
Georges Lteif

Reputation: 17

Regex expression taking too much time

I have the below regex expression in a Java code, it is taking a good deal of time to complete on some cases. Is there a way to improve it?

String decimal = "([0-9]+(\\.[0-9]+)?[/-]?)+";
String units = "(in|ft)\\.?";
String unitName = "(cu\\.? *ft|gauge|watt|rpm|ft|lbs|K|GPF|btu|mph|cfm|volt|oz|pounds|dbi|miles|amp|hour|kw|f|degrees|year)";

    sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + units + " *x? *" + decimal + " *" + units + " *x? *" + decimal + " *" + units + ""));
    sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + units + " *x? *" + decimal + " *" + units));
    sizePattern.add(Pattern.compile("(?i)" + decimal + " *x *" + decimal + " *" + units));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "( *" + units + ")"));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "( *sq?\\.?)( *ft?\\.?)"));
    sizePattern.add(Pattern.compile("(?i)" + decimal + " *" + unitName));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "(d)"));
    sizePattern.add(Pattern.compile("(?i)" + decimal + "( *(%|percent))"));
    sizePattern.add(Pattern.compile("(?i)" + decimal));

    for (Pattern p : sizePattern)
    {
        ODebug.Write(Level.FINER, "PRD-0079: Using pattern = " + p.pattern());

        m = p.matcher(_data);
        while (m.find()) 
        {
            ODebug.Write(Level.FINER, "           Got => [" + m.group(0) + "]");
            this.Dimensions.add(m.group(0));

            _data = _data.replaceAll("\\Q" + m.group(0) + "\\E", ".");
            m = p.matcher(_data);
        }
    }

String causing the issue:

Micro-Induction Cooktop provides the best in cooktop performance, safety and efficiency. Induction heats as electricity flows through a coil to produce a magnetic field under the ceramic plate. When a ferromagnetic cookware is placed on the ceramic surface, currents are induced in the cookware and instant heat is generated due to the resistance of the pan. Heat is generated to the pan only and no heat is lost. As there are no open flames, inductions are safer to use than conventional burners. Once cookware is removed, all molecular activity ceases and heating is stopped immediately.Flush surface for built-in or freestanding applicationDual functions: Cook and Warm7 power settings (100-300-500-700-900-1100-1300W)* The 2 lowest power settings cannot be actually achieved, but are ""simulated"":100W = 500W intermittently heat for 2 seconds and stop for 8 seconds300W = 500W intermittently heat for 6 seconds and stop for 4 seconds13 Keep Warm settings (100-120-140-160-180-190-210-230-250-280-300-350-390F)Touch sensitive panel with control lockUp to 8 hours timerMicro-crystal ceramic plateAutomatic pan detectionLED panelETL/ETL-Sanitation/FCC certified for household or commercial useHome Depot Protection Plan:

Upvotes: 0

Views: 993

Answers (1)

maaartinus
maaartinus

Reputation: 46492

Assuming your _data is long, it's not the matching what takes the time, but rather the assignment

_data = _data.replaceAll("\\Q" + m.group(0) + "\\E", ".");

which is O(n**2) in terms of the string length. Just don't do it.

You could do it simpler with

_data = _data.replace(m.group(0), ".");

but just don't do it. Do you need a reduced _data at the end? If so, use a single replaceAll per pattern (it uses a StringBuffer internally and is only linear in the size of the string).

Additionally:

  • Use non-capturing groups.
  • Consider recycling the Matcher by using reset(CharSequence) and usePattern(Pattern).
  • Consider combining all the patterns into one. As all of them start the same, it could be quite efficient.
  • Your decimal can probably get slow in case there's no match. Leaving out the optional part, you get "([0-9]+)+" which can backtrack needlessly a lot. Consider using atomic groups.

Upvotes: 1

Related Questions