esiegel
esiegel

Reputation: 1783

Improve BufferedReader Speed

I am crunching through many gigabytes of text data and I was wondering if there is a way to improve performance. For example when going through 10 gigabytes of data and not processing it at all, just iterating line by line, it takes about 3 minutes.

Basically I have a dataIterator wrapper that contains a BufferedReader. I continuously call this iterator, which returns the next line.

Is the problem the number of strings being created? Or perhaps the number of function calls. I don't really know how to profile this application because it get compiled as a jar and used as a STAF service.

Any and all ideas appreciated?

Upvotes: 4

Views: 4192

Answers (4)

John M
John M

Reputation: 13239

If the program is launched via a regular "java -options... ClassName args..." command line, you can profile it. I'm most familiar with NetBeans Profiler. It has a way to separately start the java app (adding a java option to the startup) then attach the profiler.

If you're trying to optimize without measuring what needs improvement, you're working in the dark. You might get lucky or you might spend lots of time doing irrelevant work.

Upvotes: 0

Javamann
Javamann

Reputation: 2922

Using NIO, Channels, byte buffers, and Memory Mapped files will give you the best performance. It's about as close to the hardware as you are going to get. I had a similar problem where I had to parse over 6 million delimited lines of text (265MB file) then move around the delimited columns in the line and then write it back out. Using NIO and 2002 hardware it took 33 seconds to do this. The trick is to leave the data as bytes. You have one thread reading the data to extract the line, another thread to manipulate the line, and a third thread to write it back out.

Upvotes: 1

Yuval Adam
Yuval Adam

Reputation: 165340

Lets start from the basis: your application is I/O-bound. You are not suffering bad performance due to object allocation, or memory, or CPU limits. Your application is running slowly because of disk access.

If you think you can improve file access, you might need to resort to lower-level programming using the JNI. File access can be improved if you handle it more efficiently by yourself, and that will need to be done on a lower level.

I am not sure that using java.nio will give you better performance by magnitude which you are looking for, although it might give you some more freedom in doing CPU/memory intensive operations while I/O is running.

The reason being is that basically, java.nio wraps the file reading with a selector, letting you be notified when a buffer is read for use, indeed giving you the asynchronous behavior which might help your performance a bit. But reading the file itself is your bottleneck, and java.nio doesn't give you anything in that area.

So try it out first, but I wouldn't keep my hopes too high for it.

Upvotes: 7

yalestar
yalestar

Reputation: 9574

I think the Java's NIO package would be immensely useful for your needs.

This Wikipedia article has some great background info on the specific improvements over "old" Java I/O.

Upvotes: 3

Related Questions