h0lmesxx
h0lmesxx

Reputation: 63

Java Reading large files into byte array chunk by chunk

So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.

I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).

This is the code that is not a complete failure

public void rundis(Path pp) {
    byte bb[] = null;

    try {
        bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
        System.out.println("byte array made");
    } catch (Exception e) {
        e.printStackTrace();
    }
    if (bb.length != 0 || bb != null) {
        System.out.println("byte array filled");
        //send to method to turn into hex
    } else {
        System.out.println("byte array NOT filled");
    }

}

I know how the process should go, but I don't know how to code that properly.

The process if you are interested:

Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed. Any and all help will be appreciated, thank you for reading :)

Upvotes: 5

Views: 27515

Answers (2)

tkausl
tkausl

Reputation: 14279

To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.

In pseudocode it would look something like this:

while there are more bytes available
    read some bytes
    process those bytes
    (write the result back to a file, if needed)

In Java, you can use a FileInputStream to read a file byte by byte or chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:

FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));

We need the FileOutputStream to write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:

byte[] buf = new byte[4096];

How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes

int read = is.read(buf);

this will read up to buf.length bytes and store them in buf. It will return the total bytes read. Then we process the bytes:

//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);

process() in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.

Last, we write the result back to a file:

os.write(ret);

We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:

int read = 0;
while((read = is.read(buf)) > 0) {
    byte[] ret = process(buf, read);
    os.write(ret);
}

and finally close the streams

is.close();
os.close();

And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even read from TCP instead of a file, the basic logic is the same.

This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.


A example implementation for the process method:

//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
    final char[] hexchars = "0123456789ABCDEF".toCharArray();
    char[] ret = new char[length * 2];
    for ( int i = 0; i < length; ++i) {
        int b = bytes[i] & 0xFF;
        ret[i * 2] = hexchars[b >>> 4];
        ret[i * 2 + 1] = hexchars[b & 0x0F];
    }
    return ret;
}

Upvotes: 9

user6749601
user6749601

Reputation:

To chunk your input use a FileInputStream:

    Path pp = FileSystems.getDefault().getPath("logs", "access.log");
    final int BUFFER_SIZE = 1024*1024; //this is actually bytes

    FileInputStream fis = new FileInputStream(pp.toFile());
    byte[] buffer = new byte[BUFFER_SIZE]; 
    int read = 0;
    while( ( read = fis.read( buffer ) ) > 0 ){
        // call your other methodes here...
    }

    fis.close();

Upvotes: 14

Related Questions