Sagar Kadam
Sagar Kadam

Reputation: 31

Fetch data from website using HtmlAgilityPack

I am developing an app in which I have to fetch data from website. The format of website is:

<div id="id1" class="class1">
    <ol class="cls_ol">
    <li>
       <div class="class2">Content 1</div>
       <div class="cls_img">
                *** Code for some image ***
       </div>
       Content 2
    </li>
    <li>  *** Same like above <li> ***  </li>
    <li>  *** Same like above <li> ***  </li>
    </ol>
</div>

I use code for fetching this...

protected void Button1_Click(object sender, EventArgs e)
{
    var obj = new HtmlWeb();
    var document = obj.Load(" ** url of a website ** ");

    var bold = document.DocumentNode.SelectNodes("//div[@class='class1']");

    foreach (var i in bold)
    {
        Response.Write(i.InnerHtml);
    }

But, the problem with my code is this, it also fetches the images of <div class="cls_img"></div>. I don't need this image. So, how to fetch all the content of <div id="id1" class="class1"> without fetch the image from <div class="cls_img">.

Upvotes: 0

Views: 941

Answers (2)

Oleks
Oleks

Reputation: 32333

Step 1 - select and remove images inside the <div class="cls_img"> inside the <div class="class1"> tag:

  var images = document.DocumentNode.SelectNodes(
      "//div[@class='class1']//*//div[@class='cls_img']//img"
  );

  // note that if no nodes found "images" variable will hold a null value
  foreach (var image in images)
  {
      image.Remove();
  }

Step 2 - select <div class="class1"> elements (you already done it) - now without that images:

  var bold = document.DocumentNode.SelectNodes("//div[@class='class1']");
  foreach (var node in bold)
  {
      Console.Write(node.InnerHtml);
  }

Upvotes: 1

chandmk
chandmk

Reputation: 3481

Loop through the nodes and find a node with the matching attribute of class="cls_img" and remove that node.

node.ParentNode.RemoveChild(node);

Upvotes: 0

Related Questions