mmascosta
mmascosta

Reputation: 131

Java: StringBuffer to byte[] without toString

The title says it all. Is there any way to convert from StringBuilder to byte[] without using a String in the middle?

The problem is that I'm managing REALLY large strings (millions of chars), and then I have a cycle that adds a char in the end and obtains the byte[]. The process of converting the StringBuffer to String makes this cycle veryyyy very very slow.

Is there any way to accomplish this? Thanks in advance!

Upvotes: 13

Views: 14262

Answers (6)

Haroldo_OK
Haroldo_OK

Reputation: 7230

If you're willing to replace the StringBuilder with something else, yet another possibility would be a Writer backed by a ByteArrayOutputStream:

ByteArrayOutputStream bout = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(bout);
try {
    writer.write("String A");
    writer.write("String B");
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(bout.toByteArray());

try {
    writer.write("String C");
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(bout.toByteArray());

As always, your mileage may vary.

Upvotes: 2

VGR
VGR

Reputation: 44292

As many have already suggested, you can use the CharBuffer class, but allocating a new CharBuffer would only make your problem worse.

Instead, you can directly wrap your StringBuilder in a CharBuffer, since StringBuilder implements CharSequence:

Charset charset = StandardCharsets.UTF_8;
CharsetEncoder encoder = charset.newEncoder();

// No allocation performed, just wraps the StringBuilder.
CharBuffer buffer = CharBuffer.wrap(stringBuilder);

ByteBuffer bytes = encoder.encode(buffer);

EDIT: Duarte correctly points out that the CharsetEncoder.encode method may return a buffer whose backing array is larger than the actual data—meaning, its capacity is larger than its limit. It is necessary either to read from the ByteBuffer itself, or to read a byte array out of the ByteBuffer that is guaranteed to be the right size. In the latter case, there's no avoiding having two copies of the bytes in memory, albeit briefly:

ByteBuffer byteBuffer = encoder.encode(buffer);

byte[] array;
int arrayLen = byteBuffer.limit();
if (arrayLen == byteBuffer.capacity()) {
    array = byteBuffer.array();
} else {
    // This will place two copies of the byte sequence in memory,
    // until byteBuffer gets garbage-collected (which should happen
    // pretty quickly once the reference to it is null'd).

    array = new byte[arrayLen];
    byteBuffer.get(array);
}

byteBuffer = null;

Upvotes: 14

user1454926
user1454926

Reputation: 199

Unfortunately, the answers above that deal with ByteBuffer's array() method are a bit buggy... The trouble is that the allocated byte[] is likely to be bigger than what you'd expect. Thus, there will be trailing NULL bytes that are hard to get rid off, since you can't "re-size" arrays in Java.

Here is an article that explains this in more detail: http://worldmodscode.wordpress.com/2012/12/14/the-java-bytebuffer-a-crash-course/

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533442

If you want performance, I wouldn't use StringBuilder or create a byte[]. Instead you can write progressively to the stream which will take the data in the first place. If you can't do that, you can copy the data from the StringBuilder to the Writer, but it much faster to not create the StringBuilder in the first place.

Upvotes: 0

tolitius
tolitius

Reputation: 22499

What are you trying to accomplish with "million of chars"? Are these logs that need to be parsed? Can you read it as just bytes and stick to a ByteBuffer? Then you can do:

buffer.array()

to get a byte[]

Depends on what it is you are doing, you can also use just a char[] or a CharBuffer:

CharBuffer cb = CharBuffer.allocate(4242);
cb.put("Depends on what it is you need to do");
... 

Then you can get a char[] as:

cp.array()

It's always good to REPL things out, it's fun and proves the point. Java REPL is not something we are accustomed to, but hey, there is Clojure to save the day which speaks Java fluently:

user=> (import java.nio.CharBuffer)
java.nio.CharBuffer

user=> (def cb (CharBuffer/allocate 4242))
#'user/cb

user=> (-> (.put cb "There Be") (.array))
#<char[] [C@206564e9>

user=> (-> (.put cb " Dragons") (.array) (String.))
"There Be Dragons"

Upvotes: 0

For starters, you should probably be using StringBuilder, since StringBuffer has synchronization overhead that's usually unnecessary.

Unfortunately, there's no way to go directly to bytes, but you can copy the chars into an array or iterate from 0 to length() and read each charAt().

Upvotes: 1

Related Questions