Jorge Limas
Jorge Limas

Reputation: 168

Web Scraping using non-defined tags

I'm trying to develop a tool to do some web scraping, I've done this before for specific websites using HTML Agility Pack, but in this case I want the user to be able to specify what information he wants to scrap by selecting the text on the website.

What I don't know is if the user selects "Product 1" is there anyway I can get the HTML tag or something so I can then feed the algorithm so I search for that same type of tag on the entire document?

Product 1

Product description

Price $0.00

Upvotes: 0

Views: 172

Answers (2)

rikitikitik
rikitikitik

Reputation: 2450

Load the HTML into an HtmlDocument object, then select the first node where the text input appears. The node has everything you might need:

    var doc = new HtmlDocument();
    string input = "Product 1";
    doc.LoadHtml("Your HTML here"); //Or doc.Load(), depends on how you're getting your HTML

    HtmlNode selectedNode = doc.DocumentNode.SelectSingleNode(string.Format("//*[contains(text(),'{0}')]", input));

    var tagName = selectedNode.Name;
    var tagClass = selectedNode.Attributes["class"].Value;
    //etc

Of course this all depends on the actual page structure, whether "Product 1" is shown anywhere else, whether other elements in the page also use the same node that contains "Product 1", etc.

Upvotes: 0

Sergey
Sergey

Reputation: 3213

seems like you want to query your DOM by a specific tag, similar to jquery selectors. Take a look at the project below, it might be what you are looking for.

https://github.com/jamietre/csquery

Upvotes: 2

Related Questions