Reputation: 2176
i am actually trying to grap the text from a tag which has some child tags
For example:
<p><span>Child Text </span><span class="price">Child Text</span><br />
I need this text</p>
This is what i am trying
HtmlElement menuElement = browser.Document.GetElementsByTagName("p");
String mytext = menuElement.InnerHtml; //also tried innerText,OuterHtml,OuterText
UPDATE: I think i have to use Htmlagilitypack, so now my question is how to do this using htmlagilitypack lib, I'm new to it.
Thanks
Upvotes: 1
Views: 2646
Reputation: 98
There are many approaches to this from using regex to web scraping libraries.i recommend you to use htmlagilitypack with that you can address exactly what you need by xpath. add reference and namespace to HtmlAgilityPack and i 'm using linq(this requires .net 3.5 or better) with the code below you can do that.
using HtmlAgilityPack;
using System.Linq;
// these references must be available.
private void Form1_Load(object sender, EventArgs e)
{
var rawData = "<p><span>Child Text </span><span class=\"price\">Child Text</span><br />I need this text</p>";
var html = new HtmlAgilityPack.HtmlDocument();
html.LoadHtml(rawData);
html.DocumentNode.SelectNodes("//p/text()").ToList().ForEach(x=>MessageBox.Show(x.InnerHtml));
}
Upvotes: 2
Reputation: 4739
You can get the text by splitting the DocumentText up into different parts.
string text = "<p><span>Child Text </span><span class="price">Child Text</span><br />I need this text</p>";
text = text.Split(new string{"<p><span>Child Text </span><span class="price">Child Text</span><br />"}, StringSplitOptions.None)[1];
// Splits the first part of the text, leaving us with "I need this text</p>"
// We can remove the last </p> many ways, but here I will show you one way.
text = text.Split(new string{"</p>"}, StringSplitOptions.None)[0];
// text now has the value of "I need this text"
Hope this Helps!
Upvotes: 0
Reputation: 13205
It's much, much easier if you can put the "need this text" inside a span with an id -- then you just grab that id's .innerHTML(). If you can't change the markup, you can grab menuElement's .innerHTML() and string match for content after "
", but that's quite fragile.
Upvotes: 0