Reputation: 35
I'm trying to split a string into an array based on two different regex delimiters, integers and non-integers, but I can't seem to get the results I want.
I have tried different combinations of string.split(regex)
without success. If I use ([^0-9]+)
I can successfully separate all non-integers together into its own array, but the integers are lost. If I try to do a combination of ([^0-9]+)([0-9]+)
I end up with strange results and not the desired output.
My first attempt was splitting the string by character, so each character no matter what type it is, is a different item on the array string.split("")
but I need numbers grouped together to manipulate, and must retain the original string by the end.
Given the string:
He1l0oo, th111s is my r@nd0m 86 str1ng
the output should be:
[He], [1], [l], [0], [oo, th], [111], [s is my r@nd], [0], [m ], [86], [ str], [1], [ng]
but I only get:
[1], [0], [111], [0], [86], [1]
I need both the non-integer and integer groups in the output so I can join the string back together in the same format, and with this output I lose everything else. Any help will be appreciated!
Upvotes: 0
Views: 1198
Reputation: 10499
Try using the regex
"(?:\\d+|\\D+)"
This matches a group of digits or a digit of non-digits, but not both.
Roughly, the code will look like the following:
Pattern pattern = Pattern.compile("(?:\\d+|\\D+)");
Matcher matcher = pattern.matcher("He1l0oo, th111s is my r@nd0m 86 str1ng");
List<String> groups = new ArrayList<>();
while (matcher.find()) {
groups.add(matcher.group());
}
System.out.println(groups);
Upvotes: 0
Reputation: 181008
The problem is that String.split()
gives you only the pieces between delimiters. The delimiters themselves -- the substrings that match the pattern -- are omitted. But you don't have actual delimiters in your string. Rather, you want to split at transitions between digits and non-digits. These can be matched via zero-width assertions:
string.split("(?<![0-9])(?=[0-9])|(?<=[0-9])(?![0-9])");
That is
(?<![0-9])
and before a digit (?=[0-9])
or (|
)
(?<=[0-9])
and before a non-digit (?![0-9])
Upvotes: 1