pzl-kleur
pzl-kleur

Reputation: 35

How to split a string based on two regex formats?

I'm trying to split a string into an array based on two different regex delimiters, integers and non-integers, but I can't seem to get the results I want.

I have tried different combinations of string.split(regex) without success. If I use ([^0-9]+) I can successfully separate all non-integers together into its own array, but the integers are lost. If I try to do a combination of ([^0-9]+)([0-9]+)I end up with strange results and not the desired output.

My first attempt was splitting the string by character, so each character no matter what type it is, is a different item on the array string.split("") but I need numbers grouped together to manipulate, and must retain the original string by the end.

Given the string:

He1l0oo, th111s is my r@nd0m 86 str1ng

the output should be:

[He], [1], [l], [0], [oo, th], [111], [s is my r@nd], [0], [m ], [86], [ str], [1], [ng]

but I only get:

[1], [0], [111], [0], [86], [1]

I need both the non-integer and integer groups in the output so I can join the string back together in the same format, and with this output I lose everything else. Any help will be appreciated!

Upvotes: 0

Views: 1198

Answers (2)

Kevin Ji
Kevin Ji

Reputation: 10499

Try using the regex

"(?:\\d+|\\D+)"

This matches a group of digits or a digit of non-digits, but not both.

Roughly, the code will look like the following:

Pattern pattern = Pattern.compile("(?:\\d+|\\D+)");
Matcher matcher = pattern.matcher("He1l0oo, th111s is my r@nd0m 86 str1ng");

List<String> groups = new ArrayList<>();
while (matcher.find()) {
    groups.add(matcher.group());
}

System.out.println(groups);

Upvotes: 0

John Bollinger
John Bollinger

Reputation: 181008

The problem is that String.split() gives you only the pieces between delimiters. The delimiters themselves -- the substrings that match the pattern -- are omitted. But you don't have actual delimiters in your string. Rather, you want to split at transitions between digits and non-digits. These can be matched via zero-width assertions:

string.split("(?<![0-9])(?=[0-9])|(?<=[0-9])(?![0-9])");

That is

  • the position after a non-digit (?<![0-9]) and before a digit (?=[0-9])

or (|)

  • the position after a digit (?<=[0-9]) and before a non-digit (?![0-9])

Upvotes: 1

Related Questions