Madhan
Madhan

Reputation: 5818

Regex to identify groups

I have the following string

@name Home @options {} @include h1,h2,h3 @exclude p,div,em

I want to split by regex and store it in a HashMap like

@name->Home
@options->{}
@include->h1,h2,h3
@exclude->p,div,em

I used the below regex but it's matching entire String after @name

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class NewClass {

    public static void main(String[] args) {
        String regex = "((?<var>@(\\S)+) (?<val>.+) *)+";

        String val = "@name Home @options {} @include h1,h2,h3 @exclude p,div,em";

        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(val);
        if (matcher.matches()) {
            System.out.println(matcher.group("var"));
            System.out.println(matcher.group("val"));
        }
    }
}

It output as

@name
Home @options {} @include h1,h2,h3 @exclude p,div,em

Upvotes: 3

Views: 122

Answers (4)

Thomas
Thomas

Reputation: 88707

The problem with your regex is that you don't know the number of groups in your input, i.e. how many @xxx groups there are. Thus you'll need to apply the regex multiple times, i.e. using a while-loop and matcher.find():

while (matcher.find()) {
  System.out.println(matcher.group("var"));
  System.out.println(matcher.group("val"));
}

That said your regex needs to match a single group only and assuming there's nothing other in between you basically match from the first @ to the next or the end of the input. Hence your expression could become (?<var>@(\S)+) (?<val>[^@]+).

That expression basically consts of 2 parts with a single space in between (you might want to change that to \s+ instead:

  • (?<var>@(\S)+) matches the group name starting with @ and resuming with anything not a whitespace. Note that the inner group is not needed here, so just use \S+ - unless you want to extract the name without the @.
  • (?<val>[^@]+) matches any sequence of at least one character that's not a @, i.e. anything up the next @ or the end of the input. Note that you'd not match empty groups that way so if you want to match those as well you might want to change the quantifier to * instead.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

Use (?<var>@\S+)\s+(?<val>\S+) regex and instead of .matches that requires a full string match, use while (matcher.find()):

String regex = "(?<var>@\\S+)\\s+(?<val>\\S+)";
String val = "@name Home @options {} @include h1,h2,h3 @exclude p,div,em";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(val);
Map<String, String> m = new HashMap<String, String>();
while (matcher.find()) {
    m.put(matcher.group("var"), matcher.group("val"));
}
System.out.println(m); // => {@name=Home, @exclude=p,div,em, @include=h1,h2,h3, @options={}}

See the Java demo

Upvotes: 1

GhostCat
GhostCat

Reputation: 140427

Why use regexes for everything?

Just saying: a simple parser that just splits on "@" might be leading to easier to understand code.

That will result in an array "var value"; and in there, you just take the substring after the first space as value.

You see - you need other people to come up with a "correct" regex. That probably means that you have to turn to other people every time you want to enhance/rework/update that regex.

Upvotes: 0

Jan
Jan

Reputation: 43169

What about:

(@[^@]+)

See a demo on regex101.com.

Upvotes: 0

Related Questions