mintuz
mintuz

Reputation: 721

Html Agility Pack c# Paragraph parsing problem

I am having a couple of issues with my code, I am trying to pull every paragraph from a page, but at the moment it is only selecting the last paragraph.

here is my code.

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='body']/p"))
{
  string text = node.InnerText;
  lblTest2.Text = text;
}

Upvotes: 1

Views: 1774

Answers (2)

Kirk Woll
Kirk Woll

Reputation: 77616

IMO, XPath is no fun. I'd recommend using LINQ syntax instead:

foreach (var node in doc.DocumentNode
    .DescendantNodes()
    .Single(x => x.Id == "body")
    .DescendantNodes()
    .Where(x => x.Name == "p")) 
{
    string text = node.InnerText;
    lblTest2.Text = text;
}

Upvotes: 1

Oded
Oded

Reputation: 499392

In your loop you are taking the current node innerText and assigning it to the label. You do this to each node, so of course you only see the last one - you are not preserving the previous ones.

Try this:

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='body']/p"))
{
  string text = node.InnerText;
  lblTest2.Text += text + Environment.NewLine;
}

Upvotes: 4

Related Questions