Reputation: 31
I am developing an app in which I have to fetch data from website. The format of website is:
<div id="id1" class="class1">
<ol class="cls_ol">
<li>
<div class="class2">Content 1</div>
<div class="cls_img">
*** Code for some image ***
</div>
Content 2
</li>
<li> *** Same like above <li> *** </li>
<li> *** Same like above <li> *** </li>
</ol>
</div>
I use code for fetching this...
protected void Button1_Click(object sender, EventArgs e)
{
var obj = new HtmlWeb();
var document = obj.Load(" ** url of a website ** ");
var bold = document.DocumentNode.SelectNodes("//div[@class='class1']");
foreach (var i in bold)
{
Response.Write(i.InnerHtml);
}
But, the problem with my code is this, it also fetches the images of <div class="cls_img"></div>
. I don't need this image. So, how to fetch all the content of <div id="id1" class="class1">
without fetch the image from <div class="cls_img">
.
Upvotes: 0
Views: 941
Reputation: 32333
Step 1 - select and remove images inside the <div class="cls_img">
inside the <div class="class1">
tag:
var images = document.DocumentNode.SelectNodes(
"//div[@class='class1']//*//div[@class='cls_img']//img"
);
// note that if no nodes found "images" variable will hold a null value
foreach (var image in images)
{
image.Remove();
}
Step 2 - select <div class="class1">
elements (you already done it) - now without that images:
var bold = document.DocumentNode.SelectNodes("//div[@class='class1']");
foreach (var node in bold)
{
Console.Write(node.InnerHtml);
}
Upvotes: 1
Reputation: 3481
Loop through the nodes and find a node with the matching attribute of class="cls_img" and remove that node.
node.ParentNode.RemoveChild(node);
Upvotes: 0