Kosala Lakshitha
Kosala Lakshitha

Reputation: 37

How to convert Reader to InputStream in java

I need to convert a Reader object into InputStream. My solution right now is below. But my concern is since this will handle big chunks of data, it will increase the memory usage drastically.

private static InputStream getInputStream(final Reader reader) {
   char[] buffer = new char[10240];
   StringBuilder builder = new StringBuilder();
   int charCount;
   try {
      while ((charCount = reader.read(buffer, 0, buffer.length)) != -1) {
         builder.append(buffer, 0, charCount);
      }
      reader.close();
   } catch (final IOException e) {
      e.printStackTrace();
   }
   return new ByteArrayInputStream(builder.toString().getBytes(StandardCharsets.UTF_8));
}

Since I use StringBuilder this will keep the full content of the reader object in memory. I want to avoid this. Is there a way I can pipe Reader object? Any help regarding this highly appreciated.

Upvotes: 1

Views: 3115

Answers (3)

bjmi
bjmi

Reputation: 603

I am not aware of any possibility with pure JDK means. The StringBufferInputStream is deprecated because it does not convert characters into bytes properly.
If Guava library is available in the project, the internal ReaderInputStream can be used via:

public InputStream asInputStream(Reader reader, Charset charset) throws IOException {
    return new CharSource() {
        @Override public Reader openStream() {
            return reader;
        }
    }.asByteSource(charset).openStream();
}

Under the hood the Reader is wrapped into the InputStream mentioned in the beginning.

Upvotes: 0

Joop Eggen
Joop Eggen

Reputation: 109613

First: a rare requirement, often it is the other way around, or there is a FileChannel, so one can use a ByteBuffer.

A PipedInputStream would be possible, starting a PipedOutputStream in a second thread. However that is unneeded.

A Reader gives chars. Unicode code points are derived from either one or two chars (the latter a surrogate pair).

/**
 * Reader for an InputSteam of UTF-8 text bytes.
 */
public class ReaderInputStream extends InputStream {

    private final Reader reader;
    private boolean eof;
    private int byteCount;
    private byte[] bytes = new byte[6];

    public ReaderInputStream(Reader reader) {
        this.reader = reader;
    }
    
    @Override
    public int read() throws IOException {
        if (byteCount > 0) {
            int c = bytes[0];
            --byteCount;
            for (int i = 0; i < byteCount; ++i) {
                bytes[i] = bytes[i + 1];
            }
            return c;
        }
        if (eof) {
            return -1;
        }

        int c = reader.read();
        if (c == -1) {
            eof = true;
            return -1;
        }
        char ch = (char) c;
        String s;
        if (Character.isHighSurrogate(ch)) {
            c = reader.read();
            if (c == -1) {
                // Error, low surrogate expected.
                eof = true;
                //return -1;
                throw new IOException("Expected a low surrogate char i.o. EOF");
            }
            char ch2 = (char) c;
            if (!Character.isLowSurrogate(ch2)) {
                throw new IOException("Expected a low surrogate char");
            }
            s = new String(new char [] {ch, ch2});
        } else {
            s = Character.toString(ch);
        }
        byte[] bs = s.getBytes(StandardCharsets.UTF_8);
        byteCount = bs.length;
        System.arraycopy(bs, 0, bytes, 0, byteCount);
        return read();
    }
}

        Path source = Paths.get("...");
        Path target = Paths.get("...");
        try (Reader reader = Files.newBufferedReader(source, StandardCharsets.UTF_8);
                InputStream in = new ReaderInputStream(reader)) {
            Files.copy(in, target);
        }

Upvotes: 1

Jems
Jems

Reputation: 11620

Using the Apache Commons IO library, you can do this conversion in one line:

//import org.apache.commons.io.input.ReaderInputStream;

    
    InputStream inputStream = new ReaderInputStream(reader, StandardCharsets.UTF_8);

You can read the documentaton for this Class at https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/ReaderInputStream.html

It might be worth trying this to see if it solves the memory issue too.

Upvotes: 6

Related Questions