Marcin Jurasz
Marcin Jurasz

Reputation: 1

c# - WebClient - downloading HTML data to variable

I am dealing with theoretically simple WebClient request. All was working fine till this Sunday and apparently the site has changed something on their level.

What I did is (simplified for analysis - both HTTP and HTTPS give the same result):

string strRemoteFileNameGPW = @"http://www.gpw.pl/ajaxindex.php?action=GPWQuotations&start=showTable&tab=all&lang=PL&type=&full=1&format=html&download_xls=1";

and next (with or without a proxy) this was working fine (e.g. using HtmlAgilityPack.HtmlDocument).

using (WebClient webClient = new WebClient())
{
 string strResult = webClient.DownloadString(strRemoteFileNameGPW);
}    
            

Not sure what changed, so I tried to review StackOverflow and tried all stuff here. Nothing worked, or I am finally blind ... and overlooked some obvious thing.

I tried to listen the traffic and finally even added some headers, trying to be closer to regular browser:

 webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36");
 webClient.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
 // webClient.Headers.Add("Host", "www.gpw.pl");
 webClient.Headers.Add("Cache-Control", "max-age=0");
 webClient.Headers.Add("Accept-Encoding", "gzip, deflate, br");
 webClient.Headers.Add("Accept-Language", "pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7");
 // webClient.Headers.Add("Upgrade-Insecure-Requests", "1");
 webClient.Headers.Add("sec-ch-ua-mobile", "?0");
 webClient.Headers.Add("sec-ch-ua-platform", "Windows");
 webClient.Headers.Add("Sec-Fetch-Site", "none");
 webClient.Headers.Add("Sec-Fetch-Mode", "navigate");
 webClient.Headers.Add("Sec-Fetch-User", "?1");
 webClient.Headers.Add("Sec-Fetch-Dest", "document");
 webClient.Headers.Add("sec-ch-ua-platform", "Windows");
 webClient.Headers.Add("sec-ch-ua-platform", "Not A;Brand\";v=\"99\", \"Chromium\";v=\"98\", \"Google Chrome\";v=\"98");

Expected result - get the displayed result and save for further parsing of data.

Nothing helps and I keep getting exception "An error occurred while sending the request." and inner exception "Error: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host".

Firewall, AV disabled. Tried on other machines. Same result. What am I missing? This was perfectly working before. Did they notice my requests and blocked requests like mine somehow?

Upvotes: 0

Views: 857

Answers (1)

Xerillio
Xerillio

Reputation: 5259

It seems the server expects the Connection: keep-alive header.

By the way: check out the remark on WebClient. I suggest you use the HttpClient instead.

An example that seems to work:

var url @"http://www.gpw.pl/ajaxindex.php?action=GPWQuotations&start=showTable&tab=all&lang=PL&type=&full=1&format=html&download_xls=1";

using var client = new HttpClient();
client.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0");
client.DefaultRequestHeaders.Accept.ParseAdd("text/html,application/xhtml+xml,application/xml");
client.DefaultRequestHeaders.Connection.ParseAdd("keep-alive");

var strResult = await client.GetStringAsync(url);

See the fiddle in action

Upvotes: 1

Related Questions