Sarim Sidd
Sarim Sidd

Reputation: 2176

how to get only the parent tag text from html in C#

i am actually trying to grap the text from a tag which has some child tags

For example:

<p><span>Child Text </span><span class="price">Child Text</span><br />
I need this text</p>

This is what i am trying

HtmlElement menuElement = browser.Document.GetElementsByTagName("p");
String mytext = menuElement.InnerHtml;   //also tried innerText,OuterHtml,OuterText

UPDATE: I think i have to use Htmlagilitypack, so now my question is how to do this using htmlagilitypack lib, I'm new to it.

Thanks

Upvotes: 1

Views: 2646

Answers (3)

saeed sheikholeslami
saeed sheikholeslami

Reputation: 98

There are many approaches to this from using regex to web scraping libraries.i recommend you to use htmlagilitypack with that you can address exactly what you need by xpath. add reference and namespace to HtmlAgilityPack and i 'm using linq(this requires .net 3.5 or better) with the code below you can do that.

using HtmlAgilityPack;
using System.Linq;

// these references must be available.

        private void Form1_Load(object sender, EventArgs e)
        {
            var rawData = "<p><span>Child Text </span><span class=\"price\">Child Text</span><br />I need this text</p>";
            var html = new HtmlAgilityPack.HtmlDocument();
            html.LoadHtml(rawData);
            html.DocumentNode.SelectNodes("//p/text()").ToList().ForEach(x=>MessageBox.Show(x.InnerHtml));
        }

Upvotes: 2

matthewr
matthewr

Reputation: 4739

You can get the text by splitting the DocumentText up into different parts.

string text = "<p><span>Child Text </span><span class="price">Child Text</span><br />I need this text</p>";
text = text.Split(new string{"<p><span>Child Text </span><span class="price">Child Text</span><br />"}, StringSplitOptions.None)[1];
// Splits the first part of the text, leaving us with "I need this text</p>"
// We can remove the last </p> many ways, but here I will show you one way.
text = text.Split(new string{"</p>"}, StringSplitOptions.None)[0];
// text now has the value of "I need this text"

Hope this Helps!

Upvotes: 0

robrich
robrich

Reputation: 13205

It's much, much easier if you can put the "need this text" inside a span with an id -- then you just grab that id's .innerHTML(). If you can't change the markup, you can grab menuElement's .innerHTML() and string match for content after "
", but that's quite fragile.

Upvotes: 0

Related Questions