Walking a WebBrowser control DOM - elements with both children and text

Question

I'm trying to walk the DOM of a WebBrowser control using C# and performing some processing each HtmlElement. (I'm doing some transformations on the DOM at the same time, but for this discussion assume that I am trying to flatten the DOM by walking each node recursively )

When I encounter something like:

Text with a link in the middle of it

I find an HtmlElement for the P tag (which contains the expected InnerText) and a child HtmlElement node corresponding to the tag A. The HtmlElement for the A tag contains the expected inner text.

But I cannot find any structures or attributes related just to the text before and after the A tag.

Is there a way to find the text before and after the text of the A tag other than the dreadful hack of comparing the InnerHtml property of the P tag with the OuterHtml property of the A tag?

Or is there another way to walk the IE DOM?

Sheng Jiang 蒋晟 · Accepted Answer

To get text nodes in the DOM, QI (a type cast in C#) the parent element (HtmlElement.DomElement in Windows Forms) for mshtml.IHTMLDOMNode.

Then you can get direct child nodes via IHTMLDOMNode.childNodes. You then enumerate the IHTMLDOMNode.childNodes collection, look for node whose type is 3 (text). If you want to look for text nodes in child elements as well, repeat this for type 1 child nodes.

Walking a WebBrowser control DOM - elements with both children and text

Answers (1)

Related Questions