Reputation: 2111
I'm trying to walk the DOM of a WebBrowser control using C# and performing some processing each HtmlElement. (I'm doing some transformations on the DOM at the same time, but for this discussion assume that I am trying to flatten the DOM by walking each node recursively )
When I encounter something like:
<p>Text with a <a href="http://www.example.com/">link</a> in the middle of it </p>
I find an HtmlElement for the P tag (which contains the expected InnerText) and a child HtmlElement node corresponding to the tag A. The HtmlElement for the A tag contains the expected inner text.
But I cannot find any structures or attributes related just to the text before and after the A tag.
Is there a way to find the text before and after the text of the A tag other than the dreadful hack of comparing the InnerHtml property of the P tag with the OuterHtml property of the A tag?
Or is there another way to walk the IE DOM?
Upvotes: 1
Views: 1044
Reputation: 15281
To get text nodes in the DOM, QI (a type cast in C#) the parent element (HtmlElement.DomElement
in Windows Forms) for mshtml.IHTMLDOMNode
.
Then you can get direct child nodes via IHTMLDOMNode.childNodes
. You then enumerate the IHTMLDOMNode.childNodes
collection, look for node whose type is 3 (text). If you want to look for text nodes in child elements as well, repeat this for type 1 child nodes.
Upvotes: 1