panagdu
panagdu

Reputation: 2133

Pattern/Matcher vs String.split() for the same regex

Why does Pattern/Matcher work with (\\d+)([a-zA-Z]+) but String.split() doesn't ?

For example:

String line = "1A2B";

Pattern p = Pattern.compile("(\\d+)([a-zA-Z]+)");
Matcher m = p.matcher(line);
System.out.println(m.groupCount());

while(m.find())
{
    System.out.println(m.group());
}

Prints :

2
1A
2B

But :

    String line = "1A2B";
    String [] arrayOfStrings = line.split("(\\d+)([a-zA-Z]+)");
    System.out.println(arrayOfStrings.length);

    for(String elem: arrayOfStrings){
        System.out.println(elem);
    }

Prints only:

0

Upvotes: 0

Views: 370

Answers (2)

nu11p01n73R
nu11p01n73R

Reputation: 26667

Why it didnt work

Because the spit would consume those characters and there is no character left to be in the output list

Solution

Not perfect but look aheads will help you

String line = "1A2B";
String [] arrayOfStrings = line.split("(?=\\d+[a-zA-Z]+)");
System.out.println(arrayOfStrings.length);

for(String elem: arrayOfStrings){
    System.out.println(elem);

will give output as

3

1A
2B

Not perfect becuase the look ahead will be true at the start of the string, thus creating an empty string in the output list at index 0. In the example you can see that the length is 3 where as we expect 2

Upvotes: 1

npinti
npinti

Reputation: 52185

That is because the .split(String regex) uses the regular expression to mark where to break the string. So, in your case, if you have 1A2B£$%^& it will print 1 string: £$%^& because it will split at 1A and then again at 2B, however, since those return empty groups, they are omitted and you are left with just £$%^&.

On the other hand, the regular expression does is that it matches the strings and puts them into groups. These groups can then be accessed at a later stage as you are doing.

Upvotes: 1

Related Questions