Gene
Gene

Reputation: 2218

Extract a value out of html using HtmlAgilityPack

Im new to c# and htmlagilitypack and i been trying to get the value of signup_form_id which is 2079787163

<form name="setupform" id="setupform" method="post" action="/signup/" target="_top">
<input type="hidden" name="form_type" value="blog" />
<input type="hidden" name="stage" value="" />
<input type="hidden" name="loc" value="signup" />
<input type='hidden' name='signup_form_id' value='2079787163' /><input type="hidden" id="_signup_form" name="_signup_form" value="9783b65654" />

Heres my coding

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("https://signup.wordpress.com/signup/");
var value = doc.DocumentNode.SelectSingleNode("//form[@name='signup_form_id'");
Console.WriteLine(value.InnerText);

I know something is really wrong with my xpath expression and im really clueless about this. Can any kind soul give some suggestions? Thanks alot!

Upvotes: 2

Views: 7413

Answers (1)

Cristian Lupascu
Cristian Lupascu

Reputation: 40526

First of all, your code fails on the doc.Load line, because that 'Load' method does not support a URI, only a file path. You should use HtmlWeb's Load method to download the HTML.

Second, the flaws in your XPath:

  • you forgot a closing bracket ]
  • there is no form with the name set to signup_form_id

In conclusion, you should modify your code as follows:

var url = "http://signup.wordpress.com/signup/";

var htmlWeb = new HtmlWeb();
var doc = htmlWeb.Load(url);

var value = doc.DocumentNode.SelectSingleNode("//form[@id='setupform']");
Console.WriteLine(value.OuterHtml);

Update: It's good that you've clarified the question; I had a wrong understanding of the problem initially.

It looks like you're looking for an input tag, not the form. So, your XPath should be modified to match this requirement.

Here's the code that reads the piece of data you need:

var url = "http://signup.wordpress.com/signup/";

var htmlWeb = new HtmlWeb();
var doc = htmlWeb.Load(url);

var signupFormIdElement = doc.DocumentNode
    .SelectSingleNode("//input[@name='signup_form_id']");

var signupFormId = signupFormIdElement.GetAttributeValue("value", "");

Console.WriteLine(signupFormId);

Upvotes: 4

Related Questions