Reputation: 42
I'm searching there for a long time, but without any helpful result.
I'm developing a PHP project using eclipse on a Ubuntu 11.04 VM. Every thing works fine. I've never need to look for the file encoding. But after deploying the project to my server, all contents were shown with the wrong encoding. After a manual conversion to UTF8 with Notepad++ my problems were solved.
Now I want to change it in my Ubuntu VM, too. And there's the problem. I've checked the preferences in Eclipse but every property ist set to UTF8: General content types, workspace, project settings, everything ...
If I look for the encoding on the terminal, it says "test_new.dat: text/plain; charset=us-ascii". All files are saved to ascii format. If I try to create a new file with the terminal ("touch") it's also the same.
Then I've tried to convert the files with iconv:
iconv -f US-ASCII -t UTF8 -o test.dat test_new.dat
But the encoding doesn't change. Especially PHP files seems to be resistant. I have some *.ini files in my project for which a conversion works?!
Any idea what to do?
Here are my locale settings of Ubuntu:
LANG=de_DE.UTF-8
LANGUAGE=de_DE:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Upvotes: 0
Views: 755
Reputation: 36
I was also wondering about character encoding and found something that might be usefull here.
When I create a new empty .txt-file on my ubuntu 12.04 and ask for its character encoding with: "file -bi filename.txt" it shows me: charset=binary. After opening it and writing something inside like "haha" I saved it using "save as" and explicitly chose UTF-8 as character encoding. Now very strangely it did not show me charset=UTF-8 after asking again, but returned charset=us-ascii. This seemed already strange. But it got even stranger, when I did the whole thing again but this time included some german specific charakters (ä in this case) in the file and saved again (this time without saving as, I just pressed save). Now it said charset=UTF-8.
It therefore seems that at least gedit is checking the file and downgrading from UTF-8 to us-ascii if there is no need for UTF-8 since the file can be encoded using us-ascii.
Hope this helped a bit even though it is not php related.
Greetings
Upvotes: 1
Reputation: 111259
UTF-8 is compatible with ASCII. An ASCII text file is therefore also valid UTF-8, and a conversion from ASCII to UTF-8 is a no-op.
Upvotes: 1