Reputation: 478
I'm trying to extract a single data from
http://www.dsebd.org/displayCompany.php?name=NBL
I showed the required field in attached picture for which
Xpath: /html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font
Error: Exception is happening and data isn't found using that Xpath. "An unhandled exception of type 'System.Net.WebException' occurred in HtmlAgilityPack.dll"
Source Code:
static void Main(string[] args)
{
/************************************************************************/
string tickerid = "Bse_Prc_tick";
HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load(@"http://www.dsebd.org/displayCompany.php?name=NBL", "GET");
if (doc != null)
{
// Fetch the stock price from the Web page
string stockprice = doc.DocumentNode.SelectSingleNode(string.Format("./html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font", tickerid)).InnerText;
Console.WriteLine(stockprice);
}
Console.WriteLine("ReadKey Starts........");
Console.ReadKey();
}
Upvotes: 2
Views: 1514
Reputation: 1273
Well, I checked. XPath's we were using are simply incorrect. The real fun starts when you will try to find where the error lies.
Just check out source code of page you are using, aside of numerous errors which hinders XPath's it even contains multiple HTML tags...
Chrome Dev Tools, and tool you were using, works on dom tree corrected by browser (all packed into single html node, added some tbody, etc).
As html structure is simply broken, so became HtmlAgilityPack parsing.
With situation as it is, you can either use RegExp or just search known elements in source (which is much faster, but less agile).
For example:
...
using System.Net; //required for Webclient
...
class Program
{
//entry point of console app
static void Main(string[] args)
{
// url to download
// "var" means I am too lazy to write "string" and let compiler decide typing
var url = @"http://www.dsebd.org/displayCompany.php?name=NBL";
// creating object in using makes Garbage Collector delete it when using block ends, as opposed to standard cleaning after whole function ends
using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
{
// simply download result to string, in this case it will be html code
string htmlCode = client.DownloadString(url);
// cut html in half op position of "Last Trade:"
// searching from beginning of string is easier/faster than searching in middle
htmlCode = htmlCode.Substring(
htmlCode.IndexOf("Last Trade:")
);
// select from .. to .. and then remove leading and trailing whitespace characters
htmlCode = htmlCode.Substring("2\">", "</font></td>").Trim();
Console.WriteLine(htmlCode);
}
Console.ReadLine();
}
}
// http://stackoverflow.com/a/17253735/3147740 <- copied from here
// this is Extension Class which adds overloaded Substring() I used in this code, it does what its comments says
public static class StringExtensions
{
/// <summary>
/// takes a substring between two anchor strings (or the end of the string if that anchor is null)
/// </summary>
/// <param name="this">a string</param>
/// <param name="from">an optional string to search after</param>
/// <param name="until">an optional string to search before</param>
/// <param name="comparison">an optional comparison for the search</param>
/// <returns>a substring based on the search</returns>
public static string Substring(this string @this, string from = null, string until = null, StringComparison comparison = StringComparison.InvariantCulture)
{
var fromLength = (from ?? string.Empty).Length;
var startIndex = !string.IsNullOrEmpty(from)
? @this.IndexOf(from, comparison) + fromLength
: 0;
if (startIndex < fromLength) { throw new ArgumentException("from: Failed to find an instance of the first anchor"); }
var endIndex = !string.IsNullOrEmpty(until)
? @this.IndexOf(until, startIndex, comparison)
: @this.Length;
if (endIndex < 0) { throw new ArgumentException("until: Failed to find an instance of the last anchor"); }
var subString = @this.Substring(startIndex, endIndex - startIndex);
return subString;
}
}
Upvotes: 2