Reputation: 3657
How can I read NUL-terminated UTF-8 string from Java ByteBuffer
starting at ByteBuffer#position()
?
ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
String s0 = /* read first string */;
String s1 = /* read second string */;
// `s0` will now contain “ABCD” and `s1` will contain “124”.
I have already tried using Charsets.UTF_8.decode(b)
but it seems this function is ignoring current ByteBuffer
postision and reads until the end of the buffer.
Is there more idiomatic way to read such string from byte buffer than seeking for byte containing 0 and the limiting the buffer to it (or copying the part with string into separate buffer)?
Upvotes: 3
Views: 1164
Reputation: 109613
In java the char \u0000
, the UTF-8 byte 0, the Unicode code point U+0 is a normal char. So read all (maybe into an overlarge byte array), and do
String s = new String(bytes, StandardCharsets.UTF_8);
String[] s0s1 = s.split("\u0000");
String s0 = s0s1[0];
String s1 = s0s1[1];
If you do not have fixed positions and must sequentially read every byte the code is ugly. One of the C founders indeed called the nul terminated string a historic mistake.
The reverse, to not produce a UTF-8 byte 0 for a java String, normally for further processing as C/C++ nul terminated strings, there exists writing a modified UTF-8, also encoding the 0 byte.
Upvotes: 1
Reputation: 3270
You can do it by replace and split functions. Convert your hex bytes to String and find 0 by a custom character. Then split your string with that custom character.
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
/**
* Created by Administrator on 8/25/2020.
*/
public class Jtest {
public static void main(String[] args) {
//ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
ByteBuffer b = ByteBuffer.allocate(10);
b.put((byte)0x61);
b.put((byte)0x62);
b.put((byte)0x63);
b.put((byte)0x64);
b.put((byte)0x00);
b.put((byte)0x31);
b.put((byte)0x32);
b.put((byte)0x34);
b.put((byte)0x00);
b.rewind();
String s0;
String s1;
// print the ByteBuffer
System.out.println("Original ByteBuffer: "
+ Arrays.toString(b.array()));
// `s0` will now contain “ABCD” and `s1` will contain “124”.
String s = StandardCharsets.UTF_8.decode(b).toString();
String ss = s.replace((char)0,';');
String[] words = ss.split(";");
for(int i=0; i < words.length; i++) {
System.out.println(" Word " + i + " = " +words[i]);
}
}
}
I believe you can do it more efficiently with removing replace.
Upvotes: 0
Reputation: 3091
Idiomatic meaning "one liner" not that I know of (unsurprising since NUL-terminated strings are not part of the Java spec).
The first thing I came up with is using b.slice().limit(x)
to create a lightweight view onto the desired bytes only (better than copying them anywhere as you might be able to work directly with the buffer)
ByteBuffer b = ByteBuffer.wrap(new byte[] {0x61, 0x62, 0x63, 0x64, 0x00, 0x31, 0x32, 0x34, 0x00 });
int i;
while (b.hasRemaining()) {
ByteBuffer nextString = b.slice(); // View on b with same start position
for (i = 0; b.hasRemaining() && b.get() != 0x00; i++) {
// Count to next NUL
}
nextString.limit(i); // view now stops before NUL
CharBuffer s = StandardCharsets.UTF_8.decode(nextString);
System.out.println(s);
}
Upvotes: 6