Ramachandra Reddy
Ramachandra Reddy

Reputation: 301

Reading file >4GB file in java

I have mainframe data file which is greater than 4GB. I need to read and process the data for every 500 bytes. I have tried using FileChannel, however I am getting error with message Integer.Max_VALUE exceeded

public void getFileContent(String fileName) {
    RandomAccessFile aFile = null;
    FileChannel inChannel = null;
    try {
        aFile = new RandomAccessFile(Paths.get(fileName).toFile(), "r");
        inChannel = aFile.getChannel();
        ByteBuffer buffer = ByteBuffer.allocate(500 * 100000);
        while (inChannel.read(buffer) > 0) {
            buffer.flip();
            for (int i = 0; i < buffer.limit(); i++) {
                byte[] data = new byte[500];
                buffer.get(data);
                processData(new String(data));
                buffer.clear();
            }
        }
    } catch (Exception ex) {
        // TODO
    } finally {
        try {
            inChannel.close();
            aFile.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Can you help me out with a solution?

Upvotes: 1

Views: 1419

Answers (1)

Holger
Holger

Reputation: 298203

The worst problem of you code is the

catch (Exception ex) {
    // TODO
}

part, which implies that you won’t notice any exceptions thrown by your code. Since there is nothing in the JRE printing a “Integer.Max_VALUE exceeded” message, that problem must be connected to your processData method.

It might be worth noting that this method will be invoked way too often with repeated data.

Your loop

for (int i = 0; i < buffer.limit(); i++) {

implies that you iterate as many times as there are bytes within the buffer, up to 500 * 100000 times. You are extracting 500 bytes from the buffer in each iteration, processing a total of up to 500 * 500 * 100000 bytes after each read, but since you have a misplaced buffer.clear(); at the end of the loop body, you will never experience a BufferUnderflowException. Instead, you will invoke processData each of the up to 500 * 100000 times with the first 500 bytes of the buffer.

But the whole conversion from bytes to a String is unnecessarily verbose and contains unnecessary copy operations. Instead of implementing this yourself, you can and should just use a Reader.

Besides that, your code makes a strange detour. It starts with a Java 7 API, Paths.get, to convert it to a legacy File object, create a legacy RandomAccessFile to eventually acquire a FileChannel. If you have a Path and want a FileChannel, you should open it directly via FileChannel.open. And, of course, use a try(…) { … } statement to ensure proper closing.

But, as said, if you want to process the contents as Strings, you surely want to use a Reader instead:

public void getFileContent(String fileName) {
    try( Reader reader=Files.newBufferedReader(Paths.get(fileName)) ) {
        CharBuffer buffer = CharBuffer.allocate(500 * 100000);
        while(reader.read(buffer) > 0) {
            buffer.flip();
            while(buffer.remaining()>500) {
                processData(buffer.slice().limit(500).toString());
                buffer.position(buffer.position()+500);
            }
            buffer.compact();
        }
        // there might be a remaining chunk of less than 500 characters
        if(buffer.position()>0) {
            processData(buffer.flip().toString());
        }
    } catch(Exception ex) {
        // the *minimum* to do:
        ex.printStackTrace();
        // TODO real exception handling
    }
}

There is no problem with processing files >4GB, I just tested it with a 8GB file. Note that the code above uses the UTF-8 encoding. If you want to retain the behavior of your original code of using whatever happens to be your system’s default encoding, you may create the Reader using

Files.newBufferedReader(Paths.get(fileName), Charset.defaultCharset())

instead.

Upvotes: 3

Related Questions