Index
Index

Reputation: 2381

Charset is utf-8 in Eclipse, reported as us-ascii when deployed on server

In relation to a previous question of mine where I am having "ghost like" errors it was suggested to check if the character encoding of my file is correct.

The file in question is a PHP file created with Eclipse PDT plugin. The file was created as UTF-8 and Eclipse still reports it as UTF-8 encoded. However, when I deploy the file at my Ubuntu / Apache2 production server its reported by the

$ file -bi

command as having US-ASCII encoding. But I can open and read the fine just file on the server (using for example Nano) and all characters appear correctly (no ? or other standin symbols).

I've transferred the file in the same way I've done with several others, using scp or sftp.

So my questions is this: Is the $ file -bi reliable or should I just ignore this as the file can be open and read fine?

Upvotes: 1

Views: 877

Answers (2)

hakre
hakre

Reputation: 197722

The file command works fine. It tells you the best it could find out. That means, if your PHP file has no BOM and only contains bytes that match US-ASCII, it will report it so.

However this does not mean that you have configured your Eclipse wrong. US-ASCII is a subset of UTF-8, UTF-8 has been designed to be backwards compatible to it.

So only if you have a character in the PHP file that can not be represented in US-ASCII then the file command will be able to detect that.

Keep in mind that character encoding is always something you have next to the data. If you lose that relation and you don't know the encoding, often it's broken because you can not guess the encoding.

The file command example shows this. That command must guess (despite any other information, it only has the data in form of the file) and therefore will tell you its best guess (and that's fine). However, do not expect it to work differently.

So the file command is fine, just don't put the wrong expectations onto it. Use the right tool for the job. Fileinfo is informative, not binding. Inside Eclipse you specify in which encoding you save the file. That is binding.

Upvotes: 2

bmargulies
bmargulies

Reputation: 100032

The file command 'sniffs' your file. If it contains only ISO-646 characters, (ISO-646 being a subset of UTF-8), file will report 'ASCII'.

The file command is nearly completely irrelevant to how your file is served by your Apache server. The question is, what content-type header is Apache supplying? You need to use the dev tools in your browser or some other tool to see. If that header is wrong, you need to fix your Apache configuration.

Upvotes: 1

Related Questions