Reputation: 11
I am working on a .Net(C#) software which get and processes an html file. I need to get the id's of the html elements from that file and i want to use regular expression for that. I've tried some combinations but with no luck. For example, if I have the line:
<a href="#" id="thisAnchor" >Link to somewhere</a><div id="divToCollect">BigDiv</div>
I want to get: thisAnchor
and divToCollect
. I am using Regex:
Regex.Matches(currentLine, expression);
Upvotes: 1
Views: 78
Reputation: 6050
You should not use regex for that, use HtmlAgilityPack and you will have no problems getting all the attributes you need:
string html = "<div id='divid'></div><a id='ancorid'></a>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var divIds = doc.DocumentNode
.Descendants("div")
.Where(div => div.Attributes["id"] != null)
.Select(div => div.Attributes["id"].Value)
.ToList();
Upvotes: 1