Reputation: 3189
I have string that looks like this aπbπc
and I want to split it to single chars/strings.
static List<String> split(String text ) {
List<String> list = new ArrayList<>(text.length());
for(int i = 0; i < text.length() ; i++) {
list.add(text.substring(i, i + 1));
}
return list;
}
public static void main(String... args) {
split("a\uD83D\uDC4Fb\uD83D\uDE42c")
.forEach(System.out::println);
}
As you might already notice instead of π and π I'm getting two weird characters:
a
?
?
b
?
?
c
Upvotes: 4
Views: 715
Reputation: 7792
There is an Open source MgntUtils library (written by me) that has a utility that translates any string into unicodes and vise-versa (handling correctly code-points) this can help you handling your problem as well as understand the internal work going on behind the sciences. Here is an example:
the code below
String result = "aπbπc";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);
would produce the following:
\u0061\u1f44f\u0062\u1f642\u0063
aπbπc
Here is te link to the article that explains about the MgntUtils library and where to get it (including javadoc and source code): Open Source Java library with stack trace filtering, Silent String parsing Unicode converter and Version comparison. Look for paragraph "String Unicode converter"
Upvotes: 0
Reputation: 44952
As per Character and String APIs docs you need to use code points to correctly handle the UTF multi-byte sequences.
"aπbπc".codePoints().mapToObj(Character::toChars).forEach(System.out::println);
will output
a
π
b
π
c
Upvotes: 6
Reputation: 4496
The following will do the job:
List<String> split(String text) {
return text.codePoints()
.mapToObj(Character::toChars)
.map(String::valueOf)
.collect(Collectors.toList());
}
Upvotes: 6