Sophivorus
Sophivorus

Reputation: 3083

Download files with non-ASCII characters in the name

My website allows users to upload files with any name. Some names, of course, will have non-ASCII characters. When a user uploads a file, I save it in a folder with its original name. However, when I try to download it, by accessing its location (for example, files/Tolstoy - How much land does a man need?.pdf), I get a 404. Is there some way to solve this, so that the files remain with their original name? Via Apache, maybe?

Upvotes: 0

Views: 2095

Answers (3)

eis
eis

Reputation: 53513

Um, just use url encoding, known also as percent encoding? that's meant to handle the urls in web. All urls printed to HTML should be url encoded.

For PHP, rawurlencode should be used, as it should be standards-compliant, which urlencode isn't.

Edit: for this issue

PHP encodes "é" as "e%26%23769%3B", instead of "e%CC%81"

e%CC%81 would be UTF-8 for . e%26%23769%3B would be for é, which is an HTML entity for the same. This means that you're doing either explicit htmlentities() call there before urlencoding, or your server setup does that automatically. It's not strictly needed if proper character sets are in place (only htmlspecialchars call is actually needed), but it shouldn't break anything either.

Some online tools if you want to test these out:

Upvotes: 1

Sophivorus
Sophivorus

Reputation: 3083

Well, for some reason that I still don't understand, using rawurlencode() instead of urlencode() made it work.

However, the character é (among others, I'm sure) is still being encoded strangely (e%26%23769%3B instead of simply %C3%A9). Even stranger is that the links containing it work.

Upvotes: 0

ern0
ern0

Reputation: 3172

Workaround: convert filenames to ASCII at upload. You will be happy with it.

Upvotes: 0

Related Questions