Reputation: 21625
perl has an option perl -C
to process utf-8, is it possible to tell perl one-liner the input is in utf-16 encoding? The BEGIN block might be used to change encoding explicitly, any simpler way there?
Upvotes: 2
Views: 824
Reputation: 6378
Can Encode
do what you want? You then might have to use encode()
and decode()
in your script so it might be no shorter than:
perl -nE 'BEGIN {binmode STDIN, ":encoding(utf16)" } ; ...'
There is a PERL_UNICODE
environment variable, but it is fairly limited: it simply mimics -C
if I recall correctly.
I once tried to find out why there aren't -C
switches for "popular" forms of UTF and it seemed to come down to whether or not they are frequently used; are or are not well understood (endianness sometimes counts - who knew?); are - or should be - obsolete; ... : in other words it's not as simple as it seems.
perl -MEncode -E 'say for Encode->encodings(":all")'
will show ~ 9 different UTF encodings.
In addtion to the usual suspects (perlrun
, perlunitut
, perlunicode
, etc.), one of the most interesting perl resources on Unicode is right here on Stackoverflow and makes for fascinating reading.
c.f. @Leon Timmerman's example and perldoc open
which is fairly thorough:
% perl -Mopen=":std,:encoding(utf-16)" -E 'print <>' UTF16.txt > other.txt
% file other.txt
other.txt: Big-endian UTF-16 Unicode text, with CRLF line terminators
I will try to find a real example using Encode
to preserve encoding that can be one-lined. It would go something like this "round trip". e.g.:
% file UTF16.txt
UTF16.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
... slurp it up and redirect it to a different file:
% perl -00 -MEncode="encode,decode" -E '
$text = decode("UTF-16LE", <>) ;
print encode("UTF-16LE", $text)' UTF16.txt > other.txt
% file other.txt
other.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
diff
and print the size of the file in bytes:
% diff UTF16.txt other.txt
% perl -E 'say [stat]->[7] for @ARGV' UTF16.txt other.txt
2220
2220
Upvotes: 3
Reputation: 30225
You can do that using perl -Mopen=":std,IN,:encoding(utf-16)" -e '...'
Upvotes: 4