out_sid3r
out_sid3r

Reputation: 1028

Download link to Linux web server character encoding mismatch

I have a web server running Ubuntu and I have some links on the web page pointing to downloadable files on the server. The issue is I'm having 404 (not found) due to character encoding.

On the Website a download link containing: Luís but the file name on the server is displayed as Lu�s when I do ls.

File links without this kind of characters don't show any issue but if I have "special" chars then 404 happens.

Any ideas on how to fix this?

Update:when I run locale I get:

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Isn't it right? I mean it's using utf8 right?

Upvotes: 1

Views: 476

Answers (2)

Esailija
Esailija

Reputation: 140236

The link should be Lu%C3%ADs (Luís). Unfortunately, the file name on your server is actually Lu%EF%BF%BDs (Lu�s), which means it was never even created correctly. If it was created programmatically, it means that the program had assumed a wrong UTF encoding when decoding the filename.

If you see Lu�s with ls, then your console is definitely not in UTF-8, it would show Lu�s if it was. But that would only show that the filename was screwed up in the first place, you need to fix the code that creates these files.

So far you should be able to download the file with the link Lu%EF%BF%BDs, but that's not a real solution because any NON-ASCII character in a filename created by the faulty code would be %EF%BF%BD in an URL.

This is all I can say from your question.

Upvotes: 1

fycth
fycth

Reputation: 3489

Which locale do you use on your web server? It would be nice if your server locale and HTML coding page were identical.

I mean, you should use UTF-8 as a server locale and UTF-8 as a codepage on your web pages.

If you have HTML link in UTF-8 codepage, but your server has locale latin-1 for example, you will get similiar issue.

So, you need to check your server's locale and ls should show you exactly the same file name you use in your HTML link.

UPDATED

how to check locale on Linux: just exec locale

how to check Apache's default coding page (if you're using Apache as your web server) - go to httpd.conf and check something like this: AddDefaultCharset utf-8

Upvotes: 1

Related Questions