JumperBot_
JumperBot_

Reputation: 572

How to read a File character-by-character in reverse without running out-of-memory?


The Story


I've been having a problem lately...

I have to read a file in reverse character by character without running out of memory.

I can't read it line-by-line and reverse it with StringBuilder because it's a one-line file that takes up to a gig (GB) of I/O space.

Hence it would take up too much of the JVM's (and the System's) Memory.

I've decided to just read it character by character from end-to-start (back-to-front) so that I could process as much as I can without consuming too much memory.


What I've Tried


I know how to read a file in one go:

(MappedByteBuffer+FileChannel+Charset which gave me OutOfMemoryExceptions)

and read a file character-by-character with UTF-8 character support

(FileInputStream+InputStreamReader).

The problem is that FileInputStream's #read() only calls #read0() which is a native method!

Because of that I have no idea about the underlying code...

Which is why I'm here today (or at least until this is done)!

Upvotes: 1

Views: 243

Answers (2)

WJS
WJS

Reputation: 40044

This will do it (but as written it is not very efficient).

  • just skip to the last location read less one and read and print the character.
  • then reset the location to the mark, adjust size and continue.
File f = new File("Some File name");
int size = (int) f.length();
int bsize = 1;
byte[] buf = new byte[bsize];
try (BufferedInputStream b =
        new BufferedInputStream(new FileInputStream(f))) {
    while (size > 0) {
        b.mark(size);
        b.skip(size - bsize);
        int k = b.read(buf);
        System.out.print((char) buf[0]);
        size -= k;
        b.reset();
    }
    
} catch (IOException ioe) {
    ioe.printStackTrace();
}

This could be improved by increasing the buffer size and making equivalent adjustments in the mark and skip arguments.

Updated Version

I wasn't fully satisfied with my answer so I made it more general. Some variables could have served double duty but using meaningful names helps clarify how they are used.

  • Mark must be used so reset can be used. However, it only needs to be set once and is set to position 0 outside of the main loop. I do not know if marking closer to the read point is more efficient or not.
  • skipCnt - initally set to fileLength it is the number of bytes to skip before reading. If the number of bytes remaining is greater than the buffer size, then the skip count will be skipCnt - bsize. Else it will be 0.
  • remainingBytes - a running total of how many bytes are still to be read. It is updated by subtracting the current readCnt.
  • readCnt - how many bytes to read. If remainingBytes is greater than bsize then set to bsize, else set to remainingBytes

The while loop continuously reads the file starting near the end and then prints the just read information in reverse order. All variables are updated and the process repeats until the remainingBytes reaches 0.

File f = new File("some file");
int bsize = 16;
int fileSize = (int)f.length();
int remainingBytes = fileSize;
int skipCnt = fileSize;
byte[] buf = new byte[bsize];
try (BufferedInputStream b =
        new BufferedInputStream(new FileInputStream(f))) {
    b.mark(0);
    while(remainingBytes > 0) {
        skipCnt = skipCnt > bsize ? skipCnt - bsize : 0;
        b.skip(skipCnt);
        int readCnt = remainingBytes > bsize ? bsize : remainingBytes; 
        b.read(buf,0,readCnt);
        for (int i = readCnt-1; i >= 0; i--) {
           System.out.print((char) buf[i]);
        }
        remainingBytes -= readCnt;
        b.reset();
    }
    
} catch (IOException ioe) {
    ioe.printStackTrace();
}

Upvotes: 2

tgdavies
tgdavies

Reputation: 11421

This doesn't support multi byte UTF-8 characters

Using a RandomAccessFile you can easily read a file in chunks from the end to the beginning, and reverse each of the chunks.

Here's a simple example:

import java.io.FileWriter;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.stream.IntStream;

class Test {
    private static final int BUF_SIZE = 10;
    private static final int FILE_LINE_COUNT = 105;

    public static void main(String[] args) throws Exception {
        // create a large file
        try (FileWriter fw = new FileWriter("largeFile.txt")) {
            IntStream.range(1, FILE_LINE_COUNT).mapToObj(Integer::toString).forEach(s -> {
                try {
                    fw.write(s + "\n");
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            });
        }
        // reverse the file
        try (RandomAccessFile raf = new RandomAccessFile("largeFile.txt", "r")) {
            long size = raf.length();
            byte[] buf = new byte[BUF_SIZE];

            for (long i = size - BUF_SIZE; i > -BUF_SIZE; i -= BUF_SIZE) {
                long offset = Math.max(0, i);
                long readSize = Math.min(i + BUF_SIZE, BUF_SIZE);
                raf.seek(offset);
                raf.read(buf, 0, (int) readSize);
                for (int j = (int) readSize - 1; j >= 0; j--) {
                    System.out.print((char) buf[j]);
                }
            }
        }
    }
}

This uses a very small file and very small chunks so that you can test it easily. Increase those constants to see it work on a larger scale.

The input file contains newlines to make it easy to read the output, but the reversal doesn't depend on the file "having lines".

Upvotes: 2

Related Questions