rippergr
rippergr

Reputation: 184

How can I scrape a table that is created with JavaScript in c#

I am trying to get a table from the web page https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/ using HtmlAgilityPack.

My code so far is

WebClient webClient = new WebClient();
        string page = webClient.DownloadString("https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/");

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(page);

        List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='list_result Result']")
                    .Descendants("tr")
                    .Skip(1)
                    .Where(tr => tr.Elements("td").Count() > 1)
                    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                    .ToList();

My problem is that the webpage creates the table by using JavaScript and when I try to read it it throws a null exception because the web page is showing that I must enable JavaScript.

I also tried to use "GET" method

 string Url = "https://www.belastingdienst.nl/rekenhulpen/wisselkoersen/";
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
            WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
            myResponse.Close();

with the same results. I already enable JavaScript in Internet Explorer and change registry as well

if (Environment.Is64BitOperatingSystem)
        Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(@"SOFTWARE\\Wow6432Node\\Microsoft\\Internet Explorer\\MAIN\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);
    else  //For 32 bit machine
        Regkey = Microsoft.Win32.Registry.LocalMachine.OpenSubKey(@"SOFTWARE\\Microsoft\\Internet Explorer\\Main\\FeatureControl\\FEATURE_BROWSER_EMULATION", true);

If I use a WebBrowser component I can see the web page without problem but I still can't get the table to list.

Upvotes: 1

Views: 1686

Answers (2)

Ole EH Dufour
Ole EH Dufour

Reputation: 3240

F12 is your friend in any browser.

Select the Network tab and you'll notice that all of the info is in this file :

https://www.belastingdienst.nl/data/douane_wisselkoersen/wks.douane.wisselkoersen.dd201806.xml

(I suppose that the data for july 2018 will be held in a url named *.dd201807.xml)

Using C# you will need to do a GET for that URL and parse it as XML, no need to use HtmlAgilityPack. You will need to construct the current year concatenated with the current month to pick the right URL.

Leuker kan ik het niet maken!

Upvotes: 3

WebClient is an http client, not a web browser, so it won't execute JavaScript. What is need is a headless web browser. See this page for a list of headless web browsers. I have not tried any of them though, so I cannot give you a recommendation here:

Headless browser for C# (.NET)?

Upvotes: 0

Related Questions