Retrieve information from a div with a specific class using HtmlAgilityPack C#

Question

I'm trying to get the information of all divs with the class="top-tournament " using HtmlAgilityPack in c#

The problem is that nodes variable is always empty, it means that I'm not doing it in the proper way

HTML example

With this code

 class Program
    {
        static void Main(string[] args)
        {
            startCrawlerAsync().Wait();
        }

        private static async Task startCrawlerAsync()
        {
            var url = "https://live.soccerstreams.net/home";
            var httpClient = new HttpClient();
            var html = await httpClient.GetStringAsync(url);
            var htmlDocument = new HtmlDocument();
            htmlDocument.LoadHtml(html);
            HtmlNodeCollection nodes = htmlDocument.DocumentNode.SelectNodes("//div[@class=\"top-tournament \"]");

        }
    }

foobar · Accepted Answer

If you look at htmlDocument.ParsedText you will see that the above website returns JavaScript as part of it's body. JavaScript then executes in your browser and builds the HTML you see. HtmlAgilityPack can't execute JavaScript to build html, so therefore you are getting null for nodes

If you want to use C# for the above task I would recommend looking at the following question: Scraping webpage generated by javascript with C#

Retrieve information from a div with a specific class using HtmlAgilityPack C#

Answers (1)

Related Questions