Vojtěch
Vojtěch

Reputation: 12416

Java: Reading file in two parts - partly as String and partly as byte[]

I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.

I am trying to read the file as follows:

    byte[] result;
    try (final FileInputStream fis = new FileInputStream(file)) {

        final InputStreamReader isr = new InputStreamReader(fis);
        final BufferedReader reader = new BufferedReader(isr);

        String line;
        // reading until \n\n
        while (!(line = reader.readLine()).trim().isEmpty()){
            // processing the line
        }

        // copying the rest of the byte array
        result = IOUtils.toByteArray(reader);
        reader.close();
    }

Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.

How can I read the rest of the file correctly and efficiently?

Thanks!

Upvotes: 0

Views: 227

Answers (3)

Vojtěch
Vojtěch

Reputation: 12416

Thanks for all the comments - the final implementation was done in this way:

    try (final FileInputStream fis = new FileInputStream(file)) {

        ByteBuffer buffer = ByteBuffer.allocate(64);

        boolean wasLast = false;
        String headerValue = null, headerKey = null;
        byte[] result = null;

        while (true) {
            byte current = (byte) fis.read();
            if (current == '\n') {
                if (wasLast) {
                    // this is \n\n
                    break;
                } else {
                    // just a new line in header
                    wasLast = true;
                    headerValue = new String(buffer.array(), 0, buffer.position()));
                    buffer.clear();
                }
            } else if (current == '\t') {
                // headerKey\theaderValue\n
                headerKey = new String(buffer.array(), 0, buffer.position());
                buffer.clear();
            } else {
                buffer.put(current);
                wasLast = false;
            }
        }
        // reading the rest
        result = IOUtils.toByteArray(fis);
    }

Upvotes: 0

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 136042

Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes

    byte[] a = Files.readAllBytes(Paths.get("file"));
    String line = "";
    byte[] result = a;
    for (int i = 0; i < a.length - 1; i++) {
        if (a[i] == '\n' && a[i + 1] == '\n') {
            line = new String(a, 0, i);
            int len = a.length - i - 1;
            result = new byte[len];
            System.arraycopy(a, i + 1, result, 0, len);
            break;
        }
    }

Upvotes: 1

Markus A.
Markus A.

Reputation: 12742

The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.

Depending on how exactly the charset is implemented, there is a slight chance that this might work:

result = IOUtils.toByteArray(reader, "ISO-8859-1");

ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.

But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.

This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.

You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:

http://www.docjar.com/html/api/java/io/BufferedReader.java.html

It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.

Upvotes: 1

Related Questions