sds
sds

Reputation: 60004

how do I find out how many characters or bytes have been read from a stream?

Java has LineNumberReader which lets me keep track of the line I am on, but how do I keep track of the byte (or char) position in a stream?

I want something similar to lseek(<fd>,0,SEEK_CUR) for files in C.

EDIT: I am reading a file using LineNumberReader in = new LineNumberReader(new FileReader(file)) and I want to be able to print something like "processed XX% of the file" every now and then. The easiest way I know is to look at the file.length() first and divide the current file position by it.

Upvotes: 2

Views: 1633

Answers (2)

Alex Cohn
Alex Cohn

Reputation: 57173

The ByteCountingInputStream solution has a drawback that it counts the input bytes even before they were processed by the LineNumberReader. This was not what I needed for my reporting, and I came up with an alternative. I assume the input file be an ASCII text with Unix-style line ending (single LF character).

I have built a subset of LineNumberReader that adds position reporting:

import java.io.*;

public class FileLineNumberReader {
    private final LineNumberReader lnr;
    private final long length;
    private long pos;

    public FileLineNumberReader(String path) throws IOException {
        lnr = new LineNumberReader(new FileReader(path));
        length = new File(path).length();
    }

    public long getLineNumber() {
        return lnr.getLineNumber();
    }

    public String readLine() throws IOException {
        String res = lnr.readLine();
        if (res != null) {
            pos += res.length() + 1;
        }
        return res;
    }

    public long getPercent() {
        return 100*pos/length;
    }
}

Note that this class hides many methods defined for the encapsulated LineNumberReader, which are not relevant for my purposes.

Upvotes: 1

John Watts
John Watts

Reputation: 8875

I suggest extending FilterInputStream as follows

public class ByteCountingInputStream extends FilterInputStream {

    private long position = 0;

    protected ByteCountingInputStream(InputStream in) {
        super(in);
    }

    public long getPosition() {
        return position;
    }

    @Override
    public int read() throws IOException {
        int byteRead = super.read();
        if (byteRead > 0) {
            position++;
        }
        return byteRead;
    }

    @Override
    public int read(byte[] b) throws IOException {
        int bytesRead = super.read(b);
        if (bytesRead > 0) {
            position += bytesRead;
        }
        return bytesRead;
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        int bytesRead = super.read(b, off, len);
        if (bytesRead > 0) {
            position += bytesRead;
        }
        return bytesRead;
    }

    @Override
    public long skip(long n) throws IOException {
        long skipped;
        skipped = super.skip(n);
        position += skipped;
        return skipped;
    }

    @Override
    public synchronized void mark(int readlimit) {
        return;
    }

    @Override
    public synchronized void reset() throws IOException {
        return;
    }

    @Override
    public boolean markSupported() {
        return false;
    }

}

And you would use it like this:

File f = new File("filename.txt");
ByteCountingInputStream bcis = new ByteCountingInputStream(new FileInputStream(f));
LineNumberReader lnr = new LineNumberReader(new InputStreamReader(bcis));
int chars = 0;
String line;
while ((line = lnr.readLine()) != null) {
    chars += line.length() + 2;
    System.out.println("Chars read: " + chars);
    System.out.println("Bytes read: " + bcis.getPosition());
}

You will notice a few things:

  1. This version counts bytes because it implements InputStream.
  2. It might just be easier to count the characters or bytes yourself in the client code.
  3. This code will count bytes as soon as they are read from the filesystem into a buffer even if they haven't been processed by the LineNumberReader. You could put count characters in a subclass of LineNumberReader instead to get around this. Unfortunately, you can't easily produce a percentage because, unlike bytes, there is no cheap way to know the number of characters in a file.

Upvotes: 1

Related Questions