Thomson
Thomson

Reputation: 21625

Process text as utf-16 via perl one-liner?

perl has an option perl -C to process utf-8, is it possible to tell perl one-liner the input is in utf-16 encoding? The BEGIN block might be used to change encoding explicitly, any simpler way there?

Upvotes: 2

Views: 824

Answers (2)

G. Cito
G. Cito

Reputation: 6378

Can Encode do what you want? You then might have to use encode() and decode() in your script so it might be no shorter than:

    perl -nE 'BEGIN {binmode STDIN, ":encoding(utf16)" } ; ...'

There is a PERL_UNICODE environment variable, but it is fairly limited: it simply mimics -C if I recall correctly.

I once tried to find out why there aren't -C switches for "popular" forms of UTF and it seemed to come down to whether or not they are frequently used; are or are not well understood (endianness sometimes counts - who knew?); are - or should be - obsolete; ... : in other words it's not as simple as it seems.

c.f. @Leon Timmerman's example and perldoc open which is fairly thorough:

% perl -Mopen=":std,:encoding(utf-16)" -E 'print <>' UTF16.txt > other.txt
% file other.txt 
other.txt: Big-endian UTF-16 Unicode text, with CRLF line terminators


Edit: Another recent discussion asking how to "Turn Off" binmode(STDOUT, ":utf8") Locally touches on PerlIO and "layers" and has a neat solution that might lend itself to a one-liner. See UTF-16 perl input output as well.

I will try to find a real example using Encode to preserve encoding that can be one-lined. It would go something like this "round trip". e.g.:

% file UTF16.txt
UTF16.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

... slurp it up and redirect it to a different file:

% perl -00 -MEncode="encode,decode"  -E '
  $text = decode("UTF-16LE", <>) ;  
  print encode("UTF-16LE", $text)' UTF16.txt > other.txt
% file other.txt
other.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

diff and print the size of the file in bytes:

% diff UTF16.txt other.txt
% perl -E 'say [stat]->[7] for @ARGV' UTF16.txt other.txt
2220
2220 

Upvotes: 3

Leon Timmermans
Leon Timmermans

Reputation: 30225

You can do that using perl -Mopen=":std,IN,:encoding(utf-16)" -e '...'

Upvotes: 4

Related Questions