Reputation: 233
I want to convert a text file with ascii encoding to utf-8 encoding. So far I have tried this:
open( my $test, ">:encoding(utf-8)", $test_file ) or die("Error: Could not open file!\n");
and ran the below command which is showing the encoding of file
file $test_file
test_file: ASCII text
Please let me know if I am missing something here.
Upvotes: 0
Views: 264
Reputation: 386551
You are doing it correctly.
ASCII is a subset of UTF-8.
decode encode
ASCII ⇒ Unicode ⇒ UTF-8
---------- ---------- ----------
00 U+0000 00
01 U+0001 01
02 U+0002 02
⋮ ⋮ ⋮
7E U+007E 7E
7F U+007F 7F
---------- ---------- ----------
ASCII ⇐ Unicode ⇐ UTF-8
encode decode
As such, an ASCII file is a UTF-8 file.[1]
When you only use that subset, file
identifies the file as being encoded using ASCII.
$ perl -M5.010 -e'use utf8; use open ":std", ":encoding(UTF-8)"; say "abcdef"' | file -
/dev/stdin: ASCII text
Going out of that subset causes file
to identify the file as text encoded using UTF-8.
$ perl -M5.010 -e'use utf8; use open ":std", ":encoding(UTF-8)"; say "abcdéf"' | file -
/dev/stdin: UTF-8 Unicode text
Upvotes: 3
Reputation: 69314
Any file that is in ASCII (i.e. containing only codepoints from 0 to 127) is already in UTF-8. There will be no difference in encoding and, hence, no way for file
to identify it as UTF-8.
Differences in encoding only happen with characters with codepoints from 128.
It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well.
(From the Wikipedia article on UTF-8)
Upvotes: 5