Reputation: 11317
I need to split a String into an array of single character Strings.
Eg, splitting "cat" would give the array "c", "a", "t"
Upvotes: 152
Views: 459331
Reputation: 5924
I found this question because I am attempting to validate the contents of a fixed length String
identifier.
Rather than use a Regex, I wanted to figure out how to use the Java Collections Library, i.e. Set<String>
, instead.
Here's what I created to validate a String
fixed length (6 characters) identifier alpha characters only prefixed with "X":
public static final Set<String> CHARACTERS_ALPHA_INVALID = Set.of("B", "D", "Q", "U");
public static final Set<String> CHARACTERS_VALID = IntStream.range(0, 26)
.mapToObj(index -> String.valueOf((char) (index + 65)))
.filter(charAsString -> !CHARACTERS_ALPHA_INVALID.contains(charAsString))
.collect(Collectors.toUnmodifiableSet());
public static boolean isCharactersValid(String candidate) {
return Optional.ofNullable(candidate)
.filter(candidateNonNull -> !candidateNonNull.isEmpty())
.map(candidateNonEmpty ->
Arrays.stream(candidateNonEmpty.split(""))
.allMatch(CHARACTERS_VALID::contains))
.orElseThrow(() -> new IllegalArgumentException("candidate must not be null or empty"));
}
public static boolean isInstanceOfNewXPrefixedSize6UserCode(String candidate) {
return Optional.ofNullable(candidate)
.map(candidateNonNull ->
candidateNonNull.startsWith("X") && (candidateNonNull.length() == 6)
&& isCharactersValid(candidateNonNull))
.orElse(false);
}
Upvotes: 0
Reputation: 39
In my previous answer I mixed up with JavaScript. Here goes an analysis of performance in Java.
I agree with the need for attention on the Unicode Surrogate Pairs in Java String. This breaks the meaning of methods like String.length()
or even the functional meaning of Character
because it's ultimately a technical object which may not represent one character in human language.
I implemented 4 methods that split a string into list of character-representing strings (String
s corresponding to human meaning of characters). And here's the result of comparison:
A line is a String
consisting of 1000 arbitrary chosen emojis and 1000 ASCII characters (1000 times <emoji><ascii>
, total 2000 "characters" in human meaning).
(discarding 256 and 512 measures)
Implementations:
public static List<String> toCharacterStringListWithCodePoints(String str) {
if (str == null) {
return Collections.emptyList();
}
return str.codePoints()
.mapToObj(Character::toString)
.collect(Collectors.toList());
}
public static List<String> toCharacterStringListWithIfBlock(String str) {
if (str == null) {
return Collections.emptyList();
}
List<String> strings = new ArrayList<>();
char[] charArray = str.toCharArray();
int delta = 1;
for (int i = 0; i < charArray.length; i += delta) {
delta = 1;
if (i < charArray.length - 1 && Character.isSurrogatePair(charArray[i], charArray[i + 1])) {
delta = 2;
strings.add(String.valueOf(new char[]{ charArray[i], charArray[i + 1] }));
} else {
strings.add(Character.toString(charArray[i]));
}
}
return strings;
}
static final Pattern p = Pattern.compile("(?<=.)");
public static List<String> toCharacterStringListWithRegex(String str) {
if (str == null) {
return Collections.emptyList();
}
return Arrays.asList(p.split(str));
}
Annex (RAW DATA):
codePoints;classic;regex;lines
45;44;84;256
14;20;98;512
29;42;91;1024
52;56;99;2048
87;121;174;4096
175;221;375;8192
345;411;839;16384
667;826;1285;32768
1277;1536;2440;65536
2426;2938;4238;131072
Upvotes: 1
Reputation: 39
We can do this simply by
const string = 'hello';
console.log([...string]); // -> ['h','e','l','l','o']
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax says
Spread syntax (...) allows an iterable such as an array expression or string to be expanded...
So, strings can be quite simply spread into arrays of characters.
Upvotes: 0
Reputation:
split("(?!^)")
does not work correctly if the string contains surrogate pairs. You should use split("(?<=.)")
.
String[] splitted = "花ab🌹🌺🌷".split("(?<=.)");
System.out.println(Arrays.toString(splitted));
output:
[花, a, b, 🌹, 🌺, 🌷]
Upvotes: 6
Reputation: 1676
If the original string contains supplementary Unicode characters, then split()
would not work, as it splits these characters into surrogate pairs. To correctly handle these special characters, a code like this works:
String[] chars = new String[stringToSplit.codePointCount(0, stringToSplit.length())];
for (int i = 0, j = 0; i < stringToSplit.length(); j++) {
int cp = stringToSplit.codePointAt(i);
char c[] = Character.toChars(cp);
chars[j] = new String(c);
i += Character.charCount(cp);
}
Upvotes: 1
Reputation: 503
To sum up the other answers...
This works on all Java versions:
"cat".split("(?!^)")
This only works on Java 8 and up:
"cat".split("")
Upvotes: 5
Reputation: 420
If characters beyond Basic Multilingual Plane are expected on input (some CJK characters, new emoji...), approaches such as "a💫b".split("(?!^)")
cannot be used, because they break such characters (results into array ["a", "?", "?", "b"]
) and something safer has to be used:
"a💫b".codePoints()
.mapToObj(cp -> new String(Character.toChars(cp)))
.toArray(size -> new String[size]);
Upvotes: 12
Reputation: 718708
An efficient way of turning a String into an array of one-character Strings would be to do this:
String[] res = new String[str.length()];
for (int i = 0; i < str.length(); i++) {
res[i] = Character.toString(str.charAt(i));
}
However, this does not take account of the fact that a char
in a String
could actually represent half of a Unicode code-point. (If the code-point is not in the BMP.) To deal with that you need to iterate through the code points ... which is more complicated.
This approach will be faster than using String.split(/* clever regex*/)
, and it will probably be faster than using Java 8+ streams. It is probable faster than this:
String[] res = new String[str.length()];
int 0 = 0;
for (char ch: str.toCharArray[]) {
res[i++] = Character.toString(ch);
}
because toCharArray
has to copy the characters to a new array.
Upvotes: 3
Reputation: 45
for(int i=0;i<str.length();i++)
{
System.out.println(str.charAt(i));
}
Upvotes: 2
Reputation: 68667
"cat".toCharArray()
But if you need strings
"cat".split("")
Edit: which will return an empty first value.
Upvotes: 135
Reputation: 1833
Maybe you can use a for loop that goes through the String content and extract characters by characters using the charAt
method.
Combined with an ArrayList<String>
for example you can get your array of individual characters.
Upvotes: 1