Mike Matczynski
Mike Matczynski

Reputation:

Java File I/O Performance Decreases Over Time

I'm trying to perform a once-through read of a large file (~4GB) using Java 5.0 x64 (on Windows XP).

Initially the file read rate is very fast, but gradually the throughput slows down substantially, and my machine seems very unresponsive as time goes on.

I've used ProcessExplorer to monitor the File I/O statistics, and it looks like the process initially reads 500MB/sec, but this rate gradually drops to around 20MB/sec.

Any ideas on the the best way to maintain File I/O rates, especially with reading large files using Java?

Here's some test code that shows the "interval time" continuing to increase. Just pass Main a file that's at least 500MB.

import java.io.File;
import java.io.RandomAccessFile;

public class MultiFileReader {

public static void main(String[] args) throws Exception {
    MultiFileReader mfr = new MultiFileReader();
    mfr.go(new File(args[0]));
}

public void go(final File file) throws Exception {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    long fileLength = raf.length();
    System.out.println("fileLen: " + fileLength);
    raf.close();

    long startTime = System.currentTimeMillis();
    doChunk(0, file, 0, fileLength);
    System.out.println((System.currentTimeMillis() - startTime) + " ms");
}

public void doChunk(int threadNum, File file, long start, long end) throws Exception {
    System.out.println("Starting partition " + start + " to " + end);
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    raf.seek(start);

    long cur = start;
    byte buf[] = new byte[1000];
    int lastPercentPrinted = 0;
    long intervalStartTime = System.currentTimeMillis();
    while (true) {
        int numRead = raf.read(buf);
        if (numRead == -1) {
            break;
        }
        cur += numRead;
        if (cur >= end) {
            break;
        }

        int percentDone = (int)(100.0 * (cur - start) / (end - start));
        if (percentDone % 5 == 0) {
            if (lastPercentPrinted != percentDone) {
                lastPercentPrinted = percentDone;
                System.out.println("Thread" + threadNum + " Percent done: " + percentDone + " Interval time: " + (System.currentTimeMillis() - intervalStartTime));
                intervalStartTime = System.currentTimeMillis();
            }
        }
    }
    raf.close();
}
}

Thanks!

Upvotes: 2

Views: 2166

Answers (5)

Ville Krumlinde
Ville Krumlinde

Reputation: 7131

The Java Garbage Collector could be a bottleneck here.

I would make the buffer larger and private to the class so it is reused instead of allocated by each call to doChunk().

public class MultiFileReader {

   private byte buf[] = new byte[256*1024];

   ...

}

Upvotes: 1

kohlerm
kohlerm

Reputation: 2624

Check static void read3() throws IOException {

        // read from the file with buffering
        // and with direct access to the buffer

        MyTimer mt = new MyTimer();
        FileInputStream fis = 
                     new FileInputStream(TESTFILE);
        cnt3 = 0;
        final int BUFSIZE = 1024;
        byte buf[] = new byte[BUFSIZE];
        int len;
        while ((len = fis.read(buf)) != -1) {
            for (int i = 0; i < len; i++) {
                if (buf[i] == 'A') {
                    cnt3++;
                }
            }
        }
        fis.close();
        System.out.println("read3 time = " 
                                + mt.getElapsed());
    }

from http://java.sun.com/developer/JDCTechTips/2002/tt0305.html

The best buffer size might depend on the operating system. Yours is maybe to0 small.

Upvotes: 0

Will Dean
Will Dean

Reputation: 39500

Depending on your specific hardware and what else is going on, you might need to work reasonably hard to do much more than 20MB/sec.

I think perhaps you don't really how completely off-the-scale the 500MB/sec is...

What are you hoping for, and have you checked that your specific drive is even theoretically capable of it?

Upvotes: 1

stili
stili

Reputation: 674

You could use JConsole to monitor your app, including memory usage. The 500 MB/sec sounds to good to be true.

Some more information about the implementation and VM arguments used would be helpful.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500155

I very much doubt that you're really getting 500MB per second from your disk. Chances are the data is cached by the operating system - and that the 20MB per second is what happens when it really hits the disk.

This will quite possibly be visible in the disk section of the Vista Resource Manager - and a low-tech way to tell is to listen to the disk drive :)

Upvotes: 10

Related Questions