Charles Morin
Charles Morin

Reputation: 67

Split string array in Java using regex

I'm trying to split this string :

aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)

so it looks like this array :

[ a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8) ]

Here are the rules, it can accept letters a to g, it can be a letter alone but if there is parentheses following it, it has to include them and its content. The content of the parentheses must be a numeric value.

This is what I tried :

content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        a = content.split("[a-g]|[a-g]\\([0-9]*\\)");
        for (String s:
             a) {
            System.out.println(s);
        }

And here's the output

(2)

(52)

(4) (2)

(14) (6) (8)h(4)5(6)

Thanks.

Upvotes: 4

Views: 2783

Answers (4)

mettleap
mettleap

Reputation: 1410

If you want to use the split method only, here is an approach you could follow too,

import java.util.Arrays;

public class Test 
{
   public static void main(String[] args)
   {
        String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        String[] a = content.replaceAll("[a-g](\\([0-9]*\\))?|[a-g]", "$0:").split(":");
        // $0 is the string which matched the regex

        System.out.println(Arrays.toString(a));

   }

}

Regex : [a-g](\\([0-9]*\\))?|[a-g] matches the strings you want to match with (i.e a, b, a(5) and so on)

Using this regex I first replace those strings with their appended versions (appended with :). Later, I split the string using the split method.

Output of the above code is,

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8), h(4)5(6)]

NOTE: This approach would only work with a delimiter that is known to not be present in the input string. For example, I chose a colon because I assumed it won't be a part of the input string.

Upvotes: 1

Glains
Glains

Reputation: 2863

You can try the following regex: [a-g](\(.*?\))?

  • [a-g]: letters from a to g required
  • (\(.*?\))?: any amout of characters between ( and ), matching as as few times as possible

You can view the expected output here.

This answer is based upon Pattern, an example:

String input = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";

Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
Matcher matcher = pattern.matcher(input);
List<String> tokens = new ArrayList<>();
while (matcher.find()) {
    tokens.add(matcher.group());
}

tokens.forEach(System.out::println);

Resulting output:

a
b
a(2)
b
b(52)
g
c(4)
d(2)
f
e(14)
f(6)
g(8)

Edit: Using [a-g](?:\((.*?)\))? you can also easily extract the inner value of a bracket:

while (matcher.find()) {
    tokens.add(matcher.group());
    tokens.add(matcher.group(1)); // the inner value or null if no () are present 
}

Upvotes: 0

dognose
dognose

Reputation: 20889

Split is the wrong approach for this, as it is hard to eliminate wrong entries.

Just "match", whatever is valid and process the result array of found matches:

[a-g](?:\(\d+\))?

Regular expression visualization

Debuggex Demo

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

It is easier to match these substrings:

String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
List<String> res = new ArrayList<>();
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    res.add(matcher.group(0)); 
} 
System.out.println(res);

Output:

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8)]

See the Java demo and a regex demo.

Pattern details

  • [a-g] - a letter from a to g
  • (?:\(\d+\))? - an optional non-capturing group matching 1 or 0 occurrences of
    • \( - a ( char
    • \d+ - 1+ digits
    • \) - a ) char.

Upvotes: 1

Related Questions