CODEWITHSUNDEEP

Reputation: 2611

WebClient.DownloadFile 404 errors with HTML characters in URI?

I'm using the WebClient class to download files from a web site and have a couple of questions.

When the URIs have HTML characters in the URI path (eg http://foo.com/path1&path2.pdf) I get 404 (not found) errors. How can I prevent this? I thought HTML characters were safe?
When the URIs represent a directory (eg http://foo.com/path) I get 403 (forbidden) errors. I understand why this is occuring but how can I test my URI to see if it represents a directory with no index page.

Upvotes: 1

Views: 1393

Answers (1)

Reputation: 56448

HTML encoded characters are not safe for URLs. You need to URL encode them. If your data is stored html encoded, you'll want to use HttpUtility.HtmlDecode to get to a properly formatted URL (i.e. foo.com/page?foo=1&bar=2. If you have special characters that must go in URLs, like ampersands that are not part of the query portion of the URL, you'll want to URL encode them. Use HttpUtility.UrlEncode
You can't.

Upvotes: 3

Related Questions