Biscuit128
Biscuit128

Reputation: 5398

Efficient Text Processing Java

I have created an application to process log files but am having some bottle neck when the amount of files = ~20

The issue comes from a particular method which takes on average a second or so to complete roughly and as you can imagime this isn't practical when it needs to be done > 50 times

private String getIdFromLine(String line){
    String[] values = line.split("\t");
    String newLine = substringBetween(values[4], "Some String : ", "Value=");
     String[] split = newLine.split(" ");
     return split[1].substring(4, split[1].length());
}



private String substringBetween(String str, String open, String close) {
      if (str == null || open == null || close == null) {
          return null;
      }
      int start = str.indexOf(open);
      if (start != -1) {
          int end = str.indexOf(close, start + open.length());
          if (end != -1) {
              return str.substring(start + open.length(), end);
          }
      }
      return null;
  }

A line comes from the reading of a file which is very efficient so I don't feel a need to post that code unless someone asks.

Is there anyway to improve perofmrance of this at all?

Thanks for your time

Upvotes: 1

Views: 544

Answers (6)

Andremoniy
Andremoniy

Reputation: 34900

One of the main problems in this code is the "split" method. For example this one:

    private String getIdFromLine3(String line) {
        int t_index = -1;
        for (int i = 0; i < 3; i++) {
            t_index = line.indexOf("\t", t_index+1);
            if (t_index == -1) return null;
        }
        //String[] values = line.split("\t");
        String newLine = substringBetween(line.substring(t_index + 1), "Some String : ", "Value=");
//        String[] split = newLine.split(" ");
        int p_index = newLine.indexOf(" ");
        if (p_index == -1) return null;
        int p_index2 = newLine.indexOf(" ", p_index+1);
        if (p_index2 == -1) return null;
        String split = newLine.substring(p_index+1, p_index2);

//        return split[1].substring(4, split[1].length());
        return split.substring(4, split.length());
    }

UPD: It could be 3 times faster.

Upvotes: 1

IDKFA
IDKFA

Reputation: 536

Could you try the regex anyway and post results please just for comparison:

Pattern p = Pattern.compile("(Some String : )(.*?)(Value=)"); //remove first and last group if not needed (adjust m.group(x) to match

@Test
public void test2(){
    String str = "Long java line with Some String : and some object with Value=154345 ";
    System.out.println(substringBetween(str));      
}

private String substringBetween(String str) {       
    Matcher m = p.matcher(str);
    if(m.find(2)){
        return m.group(2);          
    }else{
        return null;
    }
}

If this is faster find a regex that combines both functions

Upvotes: 0

gk5885
gk5885

Reputation: 3762

A few things are likely problematic:

  1. Whether or not you realized, you are using regular expressions. The argument to String.split() is a treated as a regex. Using String.indexOf() will almost certainly be a faster way to find the particular portion of the String that you want. As HRgiger points out, Guava's splitter is a good choice because it does just that.

  2. You're allocating a bunch of stuff you don't need. Depending on how long your lines are, you could be creating a ton of extra Strings and String[]s that you don't need (and the garbage collecting them). Another reason to avoid String.split().

  3. I also recommend using String.startsWith() and String.endsWith() rather that all of this stuff that you're doing with the indexOf() if only for the fact that it'd be easier to read.

Upvotes: 3

HRgiger
HRgiger

Reputation: 2790

Google guava splitter pretty fast as well.

Upvotes: 0

Roman K
Roman K

Reputation: 3337

I would recommend to use the VisualVM to find the bottle neck before oprimisation.
If you need performance in your application, you will need profiling anyways.

As optimisation i would make an custom loop to replace yours substringBetween method and get rid of multiple indexOf calls

Upvotes: 0

Jan Krakora
Jan Krakora

Reputation: 2610

I would try to use regular expressions.

Upvotes: 2

Related Questions