kojoma
kojoma

Reputation: 313

RegEx: get first occurrence of pattern

ive got a string which goes like

[...] <a rel=\"nofollow\" class=\"username offline popupctrl\" href=\"http://....html\" title=\"T3XTT0F1ND is offline\" id=\"...\">\">\">\">"[...]

If i set the pattern to

"<a rel=\"nofollow\" (.+) id=\"(.+)(?=\")"

i get T3XTT0F1ND">">"> instead of just T3XTT0F1ND at Groups[2].Value. How can i set the RegEx to not only find the first possible occurrence of 'a rel="nofollow"...' but also of 'id="' ?

Upvotes: 0

Views: 1237

Answers (2)

ridgerunner
ridgerunner

Reputation: 34385

This works for A tags where the ID attribute always follows the REL attribute. The ID value is captured into capture group 1:

Regex regexObj = new Regex(
    @"<a\b                 # Open start tag delimiter
      [^>]*?               # Everything up to REL attrib
      \b rel=""nofollow""  # REL attrib.
      [^>]*?               # Everything up to ID attrib
      \b id=""([^""]*)""   # $1: ID attrib.
      [^>]*                # Everything up to end of start tag.
    >                      # Close start tag delimiter", 
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value);
    matchResult = matchResult.NextMatch();
} 

Upvotes: 0

MPękalski
MPękalski

Reputation: 7103

Shouldn't you make one more () for the title, like

<a rel=\"nofollow\" (.+) title=\"(.+)\" id=\"(.+)(?=\")

This would result in Groups[2] returning T3XTT0F1ND is offline.

Moreover, you meant that your id is equal T3XTT0F1ND and your Groups captures more than this? If the answer is yes then you may try the regexp below

<a rel=\"nofollow\" (.+) id=\"(.+)[^>]\"

Upvotes: 1

Related Questions