Reputation: 168
I'm trying to develop a tool to do some web scraping, I've done this before for specific websites using HTML Agility Pack, but in this case I want the user to be able to specify what information he wants to scrap by selecting the text on the website.
What I don't know is if the user selects "Product 1" is there anyway I can get the HTML tag or something so I can then feed the algorithm so I search for that same type of tag on the entire document?
Product 1
Product description
Price $0.00
Upvotes: 0
Views: 172
Reputation: 2450
Load the HTML into an HtmlDocument object, then select the first node where the text input appears. The node has everything you might need:
var doc = new HtmlDocument();
string input = "Product 1";
doc.LoadHtml("Your HTML here"); //Or doc.Load(), depends on how you're getting your HTML
HtmlNode selectedNode = doc.DocumentNode.SelectSingleNode(string.Format("//*[contains(text(),'{0}')]", input));
var tagName = selectedNode.Name;
var tagClass = selectedNode.Attributes["class"].Value;
//etc
Of course this all depends on the actual page structure, whether "Product 1" is shown anywhere else, whether other elements in the page also use the same node that contains "Product 1", etc.
Upvotes: 0
Reputation: 3213
seems like you want to query your DOM by a specific tag, similar to jquery selectors. Take a look at the project below, it might be what you are looking for.
https://github.com/jamietre/csquery
Upvotes: 2