Cjxcz Odjcayrwl
Cjxcz Odjcayrwl

Reputation: 22847

How to match substring that ends with comma or end of line?

I want to parse out all attributes from the LDAP distinguished name. The attribute starts with comman or the line begin, ends with comma or the line end.

I've written the following:

    String patternStr = "[^,][A-Z]+=([A-Za-z0-9]+)[,$]";
    String str = "CN=USERID003,OU=Users,DC=intern,DC=mycompany,DC=pl";
    Pattern pattern = Pattern.compile(patternStr);
    Matcher m = pattern.matcher(str);
    while (m.find()) {
        String substr = str.substring(m.start(), m.end());
        System.out.println(substr);
        System.out.println(m.group(1));
    }

The output is:

CN=USERID003,
USERID003
OU=Users,
Users
DC=intern,
intern
DC=mycompany,
mycompany

Matching of the start with [^,] functions correctly, but the block [,$] is matching only commans, not the end of the line.

How to match as the substring end both comma and the line end?

Upvotes: 0

Views: 4715

Answers (7)

jawee
jawee

Reputation: 271

You could change your pattern "[^,][A-Z]+=([A-Za-z0-9]+)[,$]"
to "(?:^|,)[A-Z]+=([A-Za-z0-9]+)(?:,|$)"

Then, you'll get your desired results.

I guess your previous problem is:
In [...] Character Grouping, only Characters are included, not 'Boundary matchers'.
Meanwhile, [^,] means any characters except ','. while [,$] means character',' or character '$', no any boundary-matchers meaning.

Upvotes: 0

anubhava
anubhava

Reputation: 784998

You can use this lookbehind based regex for matching:

(?<=,|^)([^=]+)=([^,]*)

RegEx Demo

Code:

String patternStr = "(?<=,|^)([^=]+)=([^,]*)";
String str = "CN=USERID003,OU=Users,DC=intern,DC=mycompany,DC=pl";
Pattern pattern = Pattern.compile(patternStr);
Matcher m = pattern.matcher(str);
while (m.find()) {
    System.out.printf("%s : %s%n", m.group(1), m.group(2)); 
}

Output:

CN : USERID003
OU : Users
DC : intern
DC : mycompany
DC : pl

Upvotes: 1

HMK
HMK

Reputation: 574

Please try this:

[^,][A-Z]+=([A-Za-z0-9]+)(?:,|(?=$))

Upvotes: 1

Eoin
Eoin

Reputation: 833

I would advise you to forget about the pattern and matcher and use String.split() instead - it gives all the functionality that you want and the code is more readable.

String str = "CN=USERID003,OU=Users,DC=intern,DC=mycompany,DC=pl";
String[] attrs = str.split(",")
for (String attr : attrs) {
    System.out.println(attr);
    System.out.println(attr.split("=")[1])
}

Hope this helps!

Upvotes: 2

Jack
Jack

Reputation: 5768

Regex: (.+?,)|(.+)

Check it out here: http://www.regexr.com/399a7

Upvotes: 0

ChoiBedal
ChoiBedal

Reputation: 111

Why wouldn't you use str.split() ? And then use "for" to search all " XX = YYYY ", and then split again if you only need the attribute name or its value.

Upvotes: 1

laune
laune

Reputation: 31290

This should do what you want according to your description

String patternStr = "(?:^|,)[A-Z]+=([A-Za-z0-9]+)(?:,|$)";

Match starts at a begin of line/string or comma, and ends at a comma or end of line/string.

Upvotes: 3

Related Questions