user973479
user973479

Reputation: 1659

Java regex grouping

I have the following entry in a properties file:

some.key = \n
  [1:Some value] \n
  [14:Some other value] \n
  [834:Yet another value] \n

I am trying to parse it using a regular expression, but I can't seem to get the grouping correct. I am trying to print out a key/value for each entry. Example: Key="834", Value="Yet another value"

private static final String REGEX_PATTERN = "[(\\d+)\\:(\\w+(\\s)*)]+";

private void foo(String propValue){
    final Pattern p = Pattern.compile(REGEX_PATTERN);
    final Matcher m = p.matcher(propValue);
    while (m.find()) {
        final String key = m.group(0).trim();
        final String value = m.group(1).trim();
        System.out.println(String.format("Key[%s] Value[%s]", key, value));            
    }
}

The error I get is:

Exception: java.lang.IndexOutOfBoundsException: No group 1

I thought I was grouping correctly in the regex but I guess not. Any help would be appreciated!

Thanks

UPDATE: Escaping the brackets worked. Changed the pattern to the followingThanks for the feedback!

 private static final String REGEX_PATTERN = "\\[(\\d+)\\:(\\w+(\\w|\\s)*)\\]+";

Upvotes: 0

Views: 2352

Answers (4)

Dan Manastireanu
Dan Manastireanu

Reputation: 1822

Try this:

private static final String REGEX_PATTERN = "\\[(\\d+):([\\w\\s]+)\\]";

final Pattern p = Pattern.compile(REGEX_PATTERN);
final Matcher m = p.matcher(propValue);
while (m.find()) {
    final String key = m.group(1).trim();
    final String value = m.group(2).trim();
    System.out.println(String.format("Key[%s] Value[%s]", key, value));
}
  1. the [ and ] need to be escaped because they represent the start and end of a character class
  2. group(0) is always the full match, so your groups should start with 1
  3. note how I wrote the second group [\\w\\s]+. This means a character class of word or whitespace characters

Upvotes: 2

mathematical.coffee
mathematical.coffee

Reputation: 56905

It's your regex, [] are special characters and need to be escaped if you want to interpret them literally.

Try

"\\[(\\d+)\\:(\\w+(\\s)*)\\]"

Note - I removed the '+'. The matcher will keep finding substrings that match the pattern so the + is not necessary. (You might need to feed in a GLOBAL switch - I can't remember).

I can't help but feel this might be simpler without regex though, perhaps by splitting on \n or [ and then splitting on : for each of those.

Upvotes: 1

jpaugh
jpaugh

Reputation: 7035

[ should be escaped (as well as ]).

"\\[(\\d+)....\\]+"

[] Is used for character classes: [0-9] == (0|1|2|...|9)

Upvotes: 2

AlexR
AlexR

Reputation: 115328

Since you are using string that consists of several lines you should tell it to Pattern:

final Pattern p = Pattern.compile(REGEX_PATTERN, Pattern.MULTILINE);

Although it is irrelevant directly for you I'd recommend you to add DOTALL too:

final Pattern p = Pattern.compile(REGEX_PATTERN, Pattern.MULTILINE | Pattern.DOTALL);

Upvotes: 0

Related Questions