Regular Expression match specific tag

Question

I write a regular expression that is

$$.*content.*$$.*

It is working fine. But sometime matches non require tag.

I've found a string from word processing like as

Job Description: Irrigation/Maintenance WorkerRancho has reviewed the duties described within this job description to ensure that essential functions and basic duties are included.  It is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities required of an incumbent.  An incumbent may be asked to perform other duties as required or assigned by their supervisor.  [content]

My requirement is selecting tag which contains

[content]

But this expression matches extra tag which not contains my require text.

Any one can help me?

Wiktor Stribiżew · Accepted Answer

It is advisable to use an XML parser if you have an XML file to deal with. If you have this short fragment only, and you need it to do a one-off task, you may use either of the two regex approaches.

Extract all matches you want and check which one contains [content], and only return that substring:

Regex.Matches(s, @"(?s)]*>(.*?)")
    .Cast()
    .Where(x => x.Groups[1].Value.Contains("[content]"))
    .Select(z => z.Value);

Note that here, (?s)]*>(.*?) matches , then asserts there is no word char immediately to the right with a \b word boundary, then matches the rest of the element by consuming 0+ chars other than > and then >, then it captures any 0+ chars, as few as possible, into Group 1 (x.Groups[1].Value) and finally matches . The .Where(x => x.Groups[1].Value.Contains("[content]")) condition only keeps those that contain [content] in the inner XML part of the w:p element.



Use a more sophisticated regex with a tempered greedy token:

(?s)]*>(?:(?!


Details


(?s) - a RegexOptions.Singleline inline option
 - a  substring

\b - word boundary
[^>]* - 0+ chars other than >
> - a >
(?:(?! - any char, 0+ times but as few as possible, that is not a starting point for  followed with  a word boundary sequence

\[content] - a [content] substring
.*? - any 0+ chars, as few as possible
 - a literal  substring

Regular Expression match specific tag

Answers (1)

Related Questions