Reputation:
I was using Fizzler.Systems.HtmlAgilityPack;
in .NET to get elements using CSS selectors. Now I'm porting my project over to .NET core and there doesn't seem to be a
Fizzler although HtmlAgilityPack.NetCore is available. How do I use CSS selectors?
Upvotes: 2
Views: 3132
Reputation:
Just add the HtmlAgilityPack.CssSelectors.NetCore
NuGet package reference to your project.
Here is an example of how to use the QuerySelectorAll()
method.
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
IList<HtmlNode> nodes = doc.QuerySelectorAll("div .my-class[data-attr=123] > ul li");
HtmlNode node = nodes[0].QuerySelector("p.with-this-class span[data-myattr]");
Source: github.com/trenoncourt
Upvotes: 2
Reputation: 11348
For CSS selectors in .NET, I always used ScrapySharp (although it does not support pseudo-elements).
Add ScrapySharp.Extensions
to your using statements, and all you need is to invoke CssSelect
in any HtmlNode object, such as DocumentNode
.
using ScrapySharp.Extensions;
using HtmlAgilityPack;
namespace ConsoleLab
{
internal class Program
{
private static void Main(string[] args)
{
HtmlWeb web = new HtmlWeb();
var document = web.Load("your url");
//css class selector example
var res1 = document.DocumentNode.CssSelect(".yourClass");
//css id selector example
var res2 = document.DocumentNode.CssSelect("#yourID");
}
}
}
An alternative solution is to use AngleSharp - which is an all-in-one package for parsing and css-selecting (css selectors are built in). It's been a while since I use these, but if I am not wrong angle sharp offers better css selector support.
Anglesharp
usage examples:
//parsing a http-served url (asynchronous, used .Result here for simplification, but this defeats the asynchronoability of the code)
IBrowsingContext bc = BrowsingContext.New();
Task<IDocument> doc = bc.OpenAsync("yourAddress");
//querying a single selector match
IElement element1 = doc.Result.QuerySelector(".yourSelector");
//querying multiple selector matches
IEnumerable<IElement> elements1 = doc.Result.QuerySelectorAll(".yourSelectors");
//parsing a physical html document, non-network dependent
HtmlParser parser = new HtmlParser();
IHtmlDocument doc2 = parser.Parse("htmlFile");
IElement element2 = doc.Result.QuerySelector(".yourSelector");
IEnumerable<IElement> elements2 = doc.Result.QuerySelectorAll(".yourSelectors");
Upvotes: 0
Reputation: 260
I used HtmlAgilitypack like below:
string url = "your URL";
HtmlWeb web = new HtmlWeb();
web.PreRequest = delegate (HttpWebRequest webRequest)
{
webRequest.Timeout = 15000;
return true;
};
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
List<HtmlNode> findclasses = doc.DocumentNode.Descendants("div").Where(d =>
d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("YourClassName")
).ToList();
Upvotes: 0