Cornel Gheorghiţă
Cornel Gheorghiţă

Reputation: 11

Retrieve all the ids for a given sentence using regex in c#

I am working on a .Net(C#) software which get and processes an html file. I need to get the id's of the html elements from that file and i want to use regular expression for that. I've tried some combinations but with no luck. For example, if I have the line:

<a href="#" id="thisAnchor" >Link to somewhere</a><div id="divToCollect">BigDiv</div>

I want to get: thisAnchor and divToCollect. I am using Regex:

Regex.Matches(currentLine, expression);

Upvotes: 1

Views: 78

Answers (1)

Davor Zlotrg
Davor Zlotrg

Reputation: 6050

You should not use regex for that, use HtmlAgilityPack and you will have no problems getting all the attributes you need:

string html = "<div id='divid'></div><a id='ancorid'></a>";
var doc = new HtmlDocument();
doc.LoadHtml(html);

var divIds = doc.DocumentNode
                .Descendants("div")
                .Where(div => div.Attributes["id"] != null)
                .Select(div => div.Attributes["id"].Value)
                .ToList();

Upvotes: 1

Related Questions