Reputation: 4963
I have this HTML
<br />
<strong>Name:</strong> Josef
<br />
And I want to match the name, "Josef" in this case. I have some problems as Josef is the last word of that string if you don't use m with regex. My approach
^<strong>Name:</strong> (.*?)$
doesn't seem to work, how is this done correctly?
Upvotes: 1
Views: 2203
Reputation: 75222
Instead of using multiline mode to make the anchors work right, I would ditch the anchors:
<strong>Name:</strong>\s*([^\r\n<]+)
HTML is not a line-based format, so it doesn't really make sense to use line anchors in it. That piece of text may be on its own line today, but tomorrow it could get edited and the newlines removed; it would still be valid HTML and it would still be rendered exactly the same.
Another potential problem is that the newlines could be \r\n
(carriage-return + linefeed) instead of just \n
. The .NET regex flavor doesn't recognize \r
as (part of) a line separator, so the $
will match the position between the \r
and the \n
, and the \r
will get captured along with the name ( i.e. "Josef\r"
).
Upvotes: 0
Reputation: 31596
If you just want Joseph why not use the RightToLeft regex option to give the parser a hint to start and the end and works towards the beginning. The pattern is still left to right, and it translates to this:
string data =@"
<br />
<strong>Name:</strong> Josef
<br />
";
string pattern = @"\</strong\>\s+([^\r\n]+)";
// Put in | | to show no whitespace leakage.
Console.WriteLine ("|{0}|", Regex.Match(data, pattern, RegexOptions.RightToLeft).Groups[1].Value);
// Outputs
// |Josef|
Upvotes: 0
Reputation: 112299
You can use this regex pattern which finds a position following a prefix:
(?<=prefix)find
In your case
(?<=^<strong>Name:</strong> ).*$
It will find exactly "Josef" and you will not need to use groups. But consider using the Html Agility Pack for searches withing html.
Upvotes: 0
Reputation: 52
Dear Matthias Waldkircher,
Two solutions:
1) Using your expression:
"(?:^|\n)<strong>Name:</strong> (.*?)(?:$|\r)"
2) With other expression:
"</strong>\s(.*?)(?:\r|$)"
In both solutions your desired match will be a in this prop of the match object match.Groups[1].Value.
MetaChars used:
(?:) // unamed/unumered group;
\n // new line;
\r // carriage return;
^ // beginning of the input;
| // or
() // numered group,
$ // end of the input.
I wish you the best,
Sincerely,
Upvotes: 0
Reputation: 50104
If your HTML string has two literal linebreaks in it like as it seems to, you'll need to set your regex to multiline mode so that $
matches end-of-line
as well as end-of-string
.
Upvotes: 2
Reputation: 32787
You should use html parser
instead of regex
But if you still need it
You can do
<strong>Name:</strong>\s*(\w+)
Upvotes: 0