Reputation: 41
I've a problem which requires me to parse a text file from local machine. There are a few complications:
I've created a simple code using BufferReader
, String.indexOf
and String.substring
(to get item 3).
Inside the file it has a key (pattern) named code=
that occurs many times in different blocks. The program read each line from this file using BufferReader.readLine
. It uses indexOf
to check if the pattern appears and then it extract text after pattern and store in a common string.
When I ran my program with 600mb file, I noticed that performance was worst while it process file. I read an article in CodeRanch that Scanner
class isn't performatic for large files.
Are there some techniques or a library that could improve my performance ?
Thanks in advance.
Here's my source code:
String codeC = "code=[";
String source = "";
try {
FileInputStream f1 = new FileInputStream("c:\\Temp\\fo1.txt");
DataInputStream in = new DataInputStream(f1);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
boolean bPrnt = false;
int ln = 0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
if (strLine.indexOf(codeC) != -1) {
ln++;
System.out.println(strLine + " ---- register : " + ln);
strLine = strLine.substring(codeC.length(), strLine.length());
source = source + "\n" + strLine;
}
}
System.out.println("");
System.out.println("Lines :" + ln);
f1.close();
} catch ( ... ) {
...
}
Upvotes: 4
Views: 685
Reputation: 41
It works perfectly !!
I followed OldCurmudgeon, Marko Topolnik and AlexWien advices and my performance improved 1000%. Before the program spent 2 hours to complete described operation and write a response in file. Now it spends 5 minutes !! And SYSO remains in source code !!
I think that reason of great improvement is change String "source" for HashSet "source" like OldCurmudgeon suggests. Bur I removed DataInputStream and used "br.close" too.
Thanks guys !!
Upvotes: 0
Reputation: 200168
This code of yours is highly suspicious and may well account for at least a part of your performance issues:
FileInputStream f1 = new FileInputStream("c:\\Temp\\fo1.txt");
DataInputStream in = new DataInputStream(f1);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
You are involving DataInputStream
for no good reason, and in fact using it as an input to a Reader
can be considered a case of broken code. Write this instead:
InputStream f1 = new FileInputStream("c:\\Temp\\fo1.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fr));
A huge detriment to performance is the System.out
you are using, especially if you measure the performance when running in Eclipse, but even if running from the command line. My guess is, this is the major cause of your bottleneck. By all means ensure you don't print anything in the main loop when you aim for top performance.
Upvotes: 2
Reputation: 28737
In addition to what Marko answered, I suggest to close the br, not the f1:
br.close()
This will not affect the performance, but is cleaner. (closing the outermost stream)
Upvotes: 1
Reputation: 15641
Have a look at java.util.regex
An excellent tutorial from oracle.
A copy paste from the JAVADoc:
Classes for matching character sequences against patterns specified by regular expressions.
An instance of the Pattern class represents a regular expression that is specified in string form in a syntax similar to that used by Perl.
Instances of the Matcher class are used to match character sequences against a given pattern. Input is provided to matchers via the CharSequence interface in order to support matching against characters from a wide variety of input sources.
Unless otherwise noted, passing a null argument to a method in any class or interface in this package will cause a NullPointerException to be thrown.
Upvotes: 0