maddo7
maddo7

Reputation: 4963

Regex Match the last word of the line

I have this HTML

<br />
<strong>Name:</strong> Josef
<br />

And I want to match the name, "Josef" in this case. I have some problems as Josef is the last word of that string if you don't use m with regex. My approach

^<strong>Name:</strong> (.*?)$

doesn't seem to work, how is this done correctly?

Upvotes: 1

Views: 2203

Answers (6)

Alan Moore
Alan Moore

Reputation: 75222

Instead of using multiline mode to make the anchors work right, I would ditch the anchors:

<strong>Name:</strong>\s*([^\r\n<]+)

HTML is not a line-based format, so it doesn't really make sense to use line anchors in it. That piece of text may be on its own line today, but tomorrow it could get edited and the newlines removed; it would still be valid HTML and it would still be rendered exactly the same.

Another potential problem is that the newlines could be \r\n (carriage-return + linefeed) instead of just \n. The .NET regex flavor doesn't recognize \r as (part of) a line separator, so the $ will match the position between the \r and the \n, and the \r will get captured along with the name ( i.e. "Josef\r").

Upvotes: 0

ΩmegaMan
ΩmegaMan

Reputation: 31596

If you just want Joseph why not use the RightToLeft regex option to give the parser a hint to start and the end and works towards the beginning. The pattern is still left to right, and it translates to this:

string data =@"
<br />
<strong>Name:</strong> Josef
<br />
";

string pattern = @"\</strong\>\s+([^\r\n]+)";

// Put in | | to show no whitespace leakage.
Console.WriteLine ("|{0}|", Regex.Match(data, pattern, RegexOptions.RightToLeft).Groups[1].Value);

// Outputs
// |Josef|

Upvotes: 0

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112299

You can use this regex pattern which finds a position following a prefix:

(?<=prefix)find

In your case

(?<=^<strong>Name:</strong> ).*$

It will find exactly "Josef" and you will not need to use groups. But consider using the Html Agility Pack for searches withing html.

Upvotes: 0

Castello
Castello

Reputation: 52

Dear Matthias Waldkircher,

Two solutions:

1) Using your expression:

"(?:^|\n)<strong>Name:</strong> (.*?)(?:$|\r)"

2) With other expression:

"</strong>\s(.*?)(?:\r|$)"

In both solutions your desired match will be a in this prop of the match object match.Groups[1].Value.

MetaChars used:

(?:) // unamed/unumered group;
\n // new line;
\r // carriage return;
^ // beginning of the input;
| // or
() // numered group,
$ // end of the input.

I wish you the best,

Sincerely,

Upvotes: 0

Rawling
Rawling

Reputation: 50104

If your HTML string has two literal linebreaks in it like as it seems to, you'll need to set your regex to multiline mode so that $ matches end-of-line as well as end-of-string.

Upvotes: 2

Anirudha
Anirudha

Reputation: 32787

You should use html parser instead of regex


But if you still need it

You can do

<strong>Name:</strong>\s*(\w+)

Upvotes: 0

Related Questions