Aditya
Aditya

Reputation: 225

Improving regex-based replace performance

Hello everyone I want to ask about memory utilization and time required for a process. I have these following code. I want to optimize my code so that it will be faster. String will take more memory any alternative for that?

public String replaceSingleToWord(String strFileText) {

    strFileText = strFileText.replaceAll("\\b(\\d+)[ ]?'[ ]?(\\d+)\"", "$1 feet $2  ");
    strFileText = strFileText.replaceAll("\\b(\\d+)[ ]?'[ ]?(\\d+)''", "$1 feet $2     inch");

    //for 23o34'
    strFileText = strFileText.replaceAll("(\\d+)[ ]?(degree)+[ ]?(\\d+)'", "$1 degree $3 second");

    strFileText = strFileText.replaceAll("(\\d+((,|.)\\d+)?)sq", " $1 sq");

    strFileText = strFileText.replaceAll("(?i)(sq. Km.)", " sqkm");
    strFileText = strFileText.replaceAll("(?i)(sq.[ ]?k.m.)", " sqkm");
    strFileText = strFileText.replaceAll("(?i)\\s(lb.)", " pound");
    //for pound
    strFileText = strFileText.replaceAll("(?i)\\s(am|is|are|was|were)\\s?:", "$1 ");
    return strFileText;
}

I think it will take more memory and time I just want to reduce the complexity.I just want reduce time and memory for process what changes i need to do.is there any alternative for replaceAll function? How this code i will minimize? so that my get faster and with low memory utilization? Thank you in advanced

Upvotes: 1

Views: 1637

Answers (4)

Joop Eggen
Joop Eggen

Reputation: 109547

The regex patterns can be improved at spots_ [,.] or ? (instead [ ]?).

Use compiled static final Pattern s outside the functions.

private static final Pattern PAT = Pattern.compile("...");


StringBuffer sb = new StringBuffer();
Matcher m = PAT.matcher(strFileText);
while (m.find()) {
    m.appendReplacement(sb, "...");
}
m.appendTail(sb);
strFileText = sb.toString();

Optimisable with first testing if (m.find) before doing a new StringBuffer.

Upvotes: 0

maaartinus
maaartinus

Reputation: 46392

Use precompiled Pattern and a loop just like Joop Eggen suggested. Group your expressions together. For example, the first two can be written like

`"\\b(\\d++) ?' ?(\\d+)(?:''|\")"`

You can go much further at the expense of readability loss. A single expression for all your replacements is possible, too.

`"\\b(\\d++) ?(?:' ?(?:(\\d+)(?:''|\")|degree ?(\\d++)|...)"`

Then you need to branch on conditions like group(2) == null. This gets very hard to maintain, but with a single loop and cleverly written regex you'll win the race. :D


what will be the regex for words like can't -> canot, shouldn't -> should not etc.

It depends how exact you want to be. The most trivial way is s.replaceAll("\\Bn't\\b", " not"). The above optimizations apply, so don't ever use replaceAll when performance matters.

A general solution could go like this

Pattern SHORTENED_WORD_PATTERN =
    Pattern.compile("\\b(ca|should|wo|must|might)(n't)\\b");

String getReplacement(String trunk) {
    switch (trunk) { // needs Java 7
        case "wo": return "will not";
        case "ca": return "cannot";
        default: return trunk + " not";
    }
}

... relevant part of the replacer loop (see [replaceAll][])

    while (matcher.find()) {
        matcher.appendReplacement(result, getReplacement(matcher.group(1)));
    }

what should i do in case of strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("’", "\'"); strFileText = strFileText.replace("â€Â", "\'"); strFileText = strFileText.replace("ó", "o"); strFileText = strFileText.replace("é", "e"); strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("ç", "c"); strFileText = strFileText.replace("ú", "u"); if i want to write this in one line or other way replaceEach() is better for that case

If you go for efficiency note that all the above string starts with the same character Ã. A single regex could like á|’"|... is much slower than Ã(ƒÂƒÃ‚¡|¢Â€Â™"|...) (unless the regex engine can optimize it itself, which is currently not the case).

So write a regex where all common prefixes are extracted and use

String getReplacement(String match) {
    switch (match) { // needs Java 7
        case "á": return "a";
        case "’"": return "\\";
        ...
        default: throw new IllegalArgumentException("Unexpected: " + match);
    }
}

and

    while (matcher.find()) {
        matcher.appendReplacement(result, getReplacement(matcher.group()));
    }

Maybe a HashMap might be faster than the switch above.

Upvotes: 1

Dariusz
Dariusz

Reputation: 22241

Optimization methods:

  • use Pattern.compile() for each replace. Create a class, make patterns fields, and compile the patterns only once. That way you will save a lot of time, since regex compile takes place each time you call replaceAll() and it is a very costly operation
  • use non-greedy regexes. Instead of (\\d+) use (\\d+?).
  • try to not use regexes if possible (lb.->pound)?
  • merging several regexes with the same substitutions into one - applicable to your sqkm or feet replaces
  • you could try to base your api on StringBuilder; then use addReplacement to process your text.

Moreover a dot in many of your replaces is unescaped. Dot matches any character. Use \\..

Class idea:

class RegexProcessor {
  private Pattern feet1rep = Pattern.compile("\\b(\\d+)[ ]?'[ ]?(\\d+)\"");
  // ...

  public String process(String org) {
    String mod = feet1rep.match(org).replaceAll("$1 feet $2  ");
    /...
  }
}

Upvotes: 3

SSP
SSP

Reputation: 2670

The StringBuffer and StringBuilder classes are used when there is a necessity to make a lot of modifications to Strings of characters.

Unlike Strings objects of type StringBuffer and Stringbuilder can be modified over and over again with out leaving behind a lot of new unused objects.

The StringBuilder class was introduced as of Java 5 and the main difference between the StringBuffer and StringBuilder is that StringBuilders methods are not thread safe(not Synchronised).

It is recommended to use StringBuilder whenever possible because it is faster than StringBuffer. However if thread safety is necessary the best option is StringBuffer objects.

public class Test{

    public static void main(String args[]){
       StringBuffer sBuffer = new StringBuffer(" test");
       sBuffer.append(" String Buffer");
       System.ou.println(sBuffer);  
   }
}




public class StringBuilderDemo {
    public static void main(String[] args) {
        String palindrome = "Dot saw I was Tod";

        StringBuilder sb = new StringBuilder(palindrome);

        sb.reverse();  // reverse it

        System.out.println(sb);
    }
}

so according to your need you cal select one of tham.

Reference http://docs.oracle.com/javase/tutorial/java/data/buffers.html

Upvotes: 1

Related Questions