StackerSapper
StackerSapper

Reputation: 251

java - modify and return a buffredInputStream

I have a BufferedInputStream that I got from a FileInputStream object like :

BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream)

now, I want to remove the chars { and } from the buffredInputStream (I know the file has those chars in it). I thought that I can easily do it somehow like string replace but I saw that there is no simple way of doing it with BufferedInputStream.

any ideas how can I replace those specific chars from the BufferedInputStreamand return the new modified BufferedInputStream?

EDIT: At the end I want to decide the charset of a file. though the chars {} are causing me some issues so I want to remove them before deciding the charset of a file. this i show I am trying to decide the charset:

static String detectCharset(File file) {
    try (FileInputStream fileInputStream = new FileInputStream(file);
             BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream)) {
        CharsetDetector charsetDetector=new CharsetDetector();
        charsetDetector.setText(bufferedInputStream);
        charsetDetector.enableInputFilter(true);
        CharsetMatch cm=charsetDetector.detect();
        return cm.getName();
    } catch (Exception e) {
        return null;
    }
}

Upvotes: 0

Views: 376

Answers (1)

rzwitserloot
rzwitserloot

Reputation: 103244

NB: Adding a note to respond to the edit you have done to your question: You can't really filter } from a bag of bytes unless you know the encoding, so if you want to filter } out in order to guess at encoding you're in a chicken-and-egg situation. I do not understand how removing { and } would somehow help a charset encoding detector, though. That sounds like the detector is buggy or you're misinterpreting what it is doing. If you must, rewrite your brain to treat this as 'removing byte 123 and 125 from an inputstream' instead of 'remove chars { and } from an inputstream' and you're closer to a workable job definition. The same principle applies, except you'd write a FilterInputStream instead of a FilterReader with almost the same methods, except 123 and 125 instead of '{' and '}'.

-- original answer --

[1] InputStream refers to bytes, Reader is the same concept, except, for characters. It does not make sense to say: "filter all { from an inputstream". It would make sense to say "filter all occurrences of byte '123' from an inputstream". If it's UTF-8 or ASCII, these two are equivalent, but there's no guarantee, and it's not 'nice' code in any fashion. To read files as text, this is how:

import java.nio.file.*;

Path p = Paths.get("/path/to/file");
try (BufferedReader br = Files.newBufferedReader(p)) {
    // operate on the reader here
}

note that unlike most java methods, the methods in Files assume UTF_8. You can specify the encoding explicitly (Files.newBufferedReader(p, [ENCODING HERE])) instead. You should never rely on the system default encoding being the right one; you cannot read a file as text unless you know in what text encoding it is written!

If you must use old API:

try (FileInputStream fis = new FileInputStream("/path/to/file");
     InputStreamReader isr = new InputStreamReader(fis, StandardCharsets.UTF_8);
     BufferedReader br = new BufferedReader(isr)) {
}

note that you MUST specify charset here or things break is subtle ways.

[2] to filter out certain characters, you can either do it 'inline' (in the code that reads chars from the reader), which is trivial, or you can create a wrapper stream that can do it. Something like:

class RemoveBracesReader extends java.io.FilterReader {
    public RemoveBracesReader(Reader in) {
        super(in);
    }

    public int read() throws java.io.IOException {
        while (true) {
            int c = in.read();
            if (c != '{' && c != '}') return c;
        }
    }
}

Upvotes: 1

Related Questions