Priya
Priya

Reputation: 1425

remove html node from htmldocument :HTMLAgilityPack

In my code, I want to remove the img tag which doesn't have src value. I am using HTMLAgilitypack's HtmlDocument object. I am finding the img which doesn't have src value and trying to remove it.. but it gives me error Collection was modified; enumeration operation may not execute. Can anyone help me for this? The code which I have used is:

foreach (HtmlNode node in doc.DocumentNode.DescendantNodes())
{
    if (node.Name.ToLower() == "img")
    {                            
           string src = node.Attributes["src"].Value;
           if (string.IsNullOrEmpty(src))
           {
               node.ParentNode.RemoveChild(node, false);    
           }
   }
   else
   {
             ..........// i am performing other operations on document
   }
}

Upvotes: 13

Views: 28668

Answers (4)

MOHAMMAD
MOHAMMAD

Reputation: 21

var emptyElements = doc.DocumentNode
    .Descendants("a")
    .Where(x => x.Attributes["src"] == null || x.Attributes["src"].Value == String.Empty)
    .ToList();

emptyElements.ForEach(node => {
    if (node != null){ node.Remove();}
});

Upvotes: 1

Krzysztof Radzimski
Krzysztof Radzimski

Reputation: 4333

var emptyImages = doc.DocumentNode
 .Descendants("img")
 .Where(x => x.Attributes["src"] == null || x.Attributes["src"].Value == String.Empty)
 .Select(x => x.XPath)
 .ToList(); 

emptyImages.ForEach(xpath => { 
      var node = doc.DocumentNode.SelectSingleNode(xpath);
      if (node != null) { node.Remove(); }
    });

Upvotes: 4

Priya
Priya

Reputation: 1425

What I have done is:

    List<string> xpaths = new List<string>();
    foreach (HtmlNode node in doc.DocumentNode.DescendantNodes())
    {
                        if (node.Name.ToLower() == "img")
                        {
                            string src = node.Attributes["src"].Value;
                            if (string.IsNullOrEmpty(src))
                            {
                                xpaths.Add(node.XPath);
                                continue;
                            }
                        }
    }

    foreach (string xpath in xpaths)
    {
            doc.DocumentNode.SelectSingleNode(xpath).Remove();
    }

Upvotes: 12

Oleks
Oleks

Reputation: 32323

It seems you're modifying the collection during the enumeration by using HtmlNode.RemoveChild method.

To fix this you need is to copy your nodes to a separate list/array by calling e.g. Enumerable.ToList<T>() or Enumerable.ToArray<T>().

var nodesToRemove = doc.DocumentNode
    .SelectNodes("//img[not(string-length(normalize-space(@src)))]")
    .ToList();

foreach (var node in nodesToRemove)
    node.Remove();

If I'm right, the problem will disappear.

Upvotes: 28

Related Questions