Download all PDF files from crawled links

Question

While running code it says that ProductListPage is null and after dropping an error does not proceed forward.

Any ideas how to solve this issue? Wait until //div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a is found or something else?

Here is my current code:

HtmlDocument htmlDoc = new HtmlWeb().Load("https://example.com/");
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
foreach (HtmlNode src in ProductListPage)
{
    htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);

    HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
    if (LinkTester != null)
    {
        foreach (var dllink in LinkTester)
        {
            string LinkURL = dllink.Attributes["href"].Value;
            Console.WriteLine(LinkURL);

            string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
            var DLClient = new WebClient();

            DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:	emp" + ExtractFilename);
        }
    }
}

EDIT:

Code seems to work without VPN connection, however it does not work with VPN. I have alternative made using Python and BeautifulSoup and it works regardless of VPN connection. Is there any idea why C# and htmlAgilityPack does not do the trick?

EDIT2:

I have noticed that on VPN connection page is loaded with a slight delay. First page is getting loaded and then comes the content.

10101 · Accepted Answer

After about 2 months of searching and reading finally there is a solution. Adding this to app.config worked for me without the need for any code changes:

so my app.config looks like this now:

Please give original answer credits for this! https://stackoverflow.com/a/40900485/7202022

Download all PDF files from crawled links

Answers (2)

Related Questions