moorara
moorara

Reputation: 4226

A regular expression for anchor html tag in C#?

I need a regular expression in C# for anchor tag in html source codes as general as it's possible. Consider this html code:

<a id="[constant]"
      href="[specific]"
    >GlobalPlatform Card Specification 2.2
    March, 2006</a>

By [constant] I mean the value is a constant string so there is no problem with it. By [specific] I mean the address is a simple and specific string so the regular expression for it, is simple. The main problem is that I can not handle the newline character in the middle of title of anchor tag. I wrote this regular expression previously that works well except handling the newline character between title of anchor tag.

<a[\\s\\n\\r]+id=\"[constant]"[\\s\\n\\r]+href=\"[specific]"[\\s\\n\\r]*>[\\s\\n\\r]*[^\\n\\r]+[\\s\\n\\r]*</a>

Please help me

Upvotes: 2

Views: 2114

Answers (2)

Jo&#227;o Angelo
Jo&#227;o Angelo

Reputation: 57728

You should stay away from regular expressions when it comes to parse HTML and use an HTML parser like the HTML Agility Pack.

And to help you get started check how simple it can be to parse that single anchor tag.

HtmlDocument doc = new HtmlDocument();

doc.LoadHtml(@"<a id=""[constant]""
      href=""[specific]""
    >GlobalPlatform Card Specification 2.2
    March, 2006</a>
");

var anchor = doc.DocumentNode.Element("a");

Console.WriteLine(anchor.Id);
Console.WriteLine(anchor.Attributes["href"].Value);

Beats regular expressions, don't you think? :)

Upvotes: 6

Senad Meškin
Senad Meškin

Reputation: 13756

if you are using C# you can define option multiline while creating Regex,

Regex r = new Regex(pattern, RegexOptions.Multiline);

Upvotes: 2

Related Questions