Reputation: 456
HTML
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<form action="demo_form.asp" id="form1" method="get">
First name: <input type="text" name="fname"><br>
Last name: <input type="text" name="lname"><br>
<input type="submit" value="Submit">
</form>
</body>
</html>
Code
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(@"C:\sample.html"));
HtmlNode nd = doc.DocumentNode.SelectSingleNode("//form[@id='form1']");
//nd.InnerHtml is "".
//nd.InnerText is "".
Problem
nd.ChildNodes //Collection(to get all nodes in form) is always null.
nd.SelectNodes("/input") //returns null.
nd.SelectNodes("./input") //returns null.
"//form[@id='form1']/input" //returns null.
what i want is to access childnodes of form tag with id=form1 one by one in order of occurrence. I tried same xpath in chrome developer console and it works just exactly the way i wanted. Is HTMlAgility pack is having problem in reading html from file or Web.
Upvotes: 1
Views: 753
Reputation: 3463
Try adding the following statement before loading the document:
HtmlNode.ElementsFlags.Remove("form");
HtmlAgilityPack's default behaviour adds all the form's inner-elements as siblings in stead of children. The statement above alters that behaviour so that they (meaning the input tags) will appear as childnodes.
Your code would look like this:
HtmlNode.ElementsFlags.Remove("form");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(@"C:\sample.html"));
HtmlNode nd = doc.DocumentNode.SelectSingleNode("//form[@id='form1']");
etc...
references:
Upvotes: 0
Reputation: 301
Your html is invalid and may be preventing the html agility pack from working properly.
Try adding a doctype (and an xml namespace) to the start of your document and change your input element's closing tags from > to />
Upvotes: 1