Reputation: 131
The title says it all. Is there any way to convert from StringBuilder to byte[] without using a String in the middle?
The problem is that I'm managing REALLY large strings (millions of chars), and then I have a cycle that adds a char in the end and obtains the byte[]. The process of converting the StringBuffer to String makes this cycle veryyyy very very slow.
Is there any way to accomplish this? Thanks in advance!
Upvotes: 13
Views: 14262
Reputation: 7230
If you're willing to replace the StringBuilder
with something else, yet another possibility would be a Writer
backed by a ByteArrayOutputStream
:
ByteArrayOutputStream bout = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(bout);
try {
writer.write("String A");
writer.write("String B");
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(bout.toByteArray());
try {
writer.write("String C");
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(bout.toByteArray());
As always, your mileage may vary.
Upvotes: 2
Reputation: 44292
As many have already suggested, you can use the CharBuffer class, but allocating a new CharBuffer would only make your problem worse.
Instead, you can directly wrap your StringBuilder in a CharBuffer, since StringBuilder implements CharSequence:
Charset charset = StandardCharsets.UTF_8;
CharsetEncoder encoder = charset.newEncoder();
// No allocation performed, just wraps the StringBuilder.
CharBuffer buffer = CharBuffer.wrap(stringBuilder);
ByteBuffer bytes = encoder.encode(buffer);
EDIT: Duarte correctly points out that the CharsetEncoder.encode
method may return a buffer whose backing array is larger than the actual data—meaning, its capacity is larger than its limit. It is necessary either to read from the ByteBuffer itself, or to read a byte array out of the ByteBuffer that is guaranteed to be the right size. In the latter case, there's no avoiding having two copies of the bytes in memory, albeit briefly:
ByteBuffer byteBuffer = encoder.encode(buffer);
byte[] array;
int arrayLen = byteBuffer.limit();
if (arrayLen == byteBuffer.capacity()) {
array = byteBuffer.array();
} else {
// This will place two copies of the byte sequence in memory,
// until byteBuffer gets garbage-collected (which should happen
// pretty quickly once the reference to it is null'd).
array = new byte[arrayLen];
byteBuffer.get(array);
}
byteBuffer = null;
Upvotes: 14
Reputation: 199
Unfortunately, the answers above that deal with ByteBuffer's array() method are a bit buggy... The trouble is that the allocated byte[] is likely to be bigger than what you'd expect. Thus, there will be trailing NULL bytes that are hard to get rid off, since you can't "re-size" arrays in Java.
Here is an article that explains this in more detail: http://worldmodscode.wordpress.com/2012/12/14/the-java-bytebuffer-a-crash-course/
Upvotes: 1
Reputation: 533442
If you want performance, I wouldn't use StringBuilder or create a byte[]. Instead you can write progressively to the stream which will take the data in the first place. If you can't do that, you can copy the data from the StringBuilder to the Writer, but it much faster to not create the StringBuilder in the first place.
Upvotes: 0
Reputation: 22499
What are you trying to accomplish with "million of chars"? Are these logs that need to be parsed? Can you read it as just bytes and stick to a ByteBuffer? Then you can do:
buffer.array()
to get a byte[]
Depends on what it is you are doing, you can also use just a char[]
or a CharBuffer:
CharBuffer cb = CharBuffer.allocate(4242);
cb.put("Depends on what it is you need to do");
...
Then you can get a char[]
as:
cp.array()
It's always good to REPL things out, it's fun and proves the point. Java REPL is not something we are accustomed to, but hey, there is Clojure to save the day which speaks Java fluently:
user=> (import java.nio.CharBuffer)
java.nio.CharBuffer
user=> (def cb (CharBuffer/allocate 4242))
#'user/cb
user=> (-> (.put cb "There Be") (.array))
#<char[] [C@206564e9>
user=> (-> (.put cb " Dragons") (.array) (String.))
"There Be Dragons"
Upvotes: 0
Reputation: 77167
For starters, you should probably be using StringBuilder
, since StringBuffer
has synchronization overhead that's usually unnecessary.
Unfortunately, there's no way to go directly to byte
s, but you can copy the char
s into an array or iterate from 0
to length()
and read each charAt()
.
Upvotes: 1