user15964
user15964

Reputation: 2639

Difference in Python and Perl print regarding character encoding

I am on windows system.

I created two utf-8 file python_print.py for python and perl_print.pl for perl respectively, the two files contains same line as below

print("中")

and perl has ; delimiter.

My CMD is in code page 936 by default, and I run

python python_print.py

I got

However, when I run

perl perl_print.pl

for the first time, it gives

running it for the second time, I got

enter image description here

why??

I continue testing, I run chcp 65001 to change cmd encoding to utf-8, and this time, both python and perl gives correct "中".

Now I am completely confused, it seems that print in python and perl are quite different. It seems that perl alway print out utf8 bytes? and python print can detect CMD code page to print correct byte? Can somebody explain my test result?

Upvotes: 2

Views: 254

Answers (1)

ysth
ysth

Reputation: 98398

perl is printing the literal bytes you have in your source file. It sees the string as "\xe4\xb8\xad" unless you explicitly declare that your source file is utf8 with use utf8;.

Once you do that, you would (if you enabled warnings as you should) get a Wide character in print warning; perl requires you to specify the encoding to be used when outputing non-ASCII characters. You can do that with use open ':std' => ':encoding(cp936)'; or with binmode STDOUT, ':encoding(cp936)'; or (for some filehandle you are opening) with the 3rd argument to open.

Upvotes: 7

Related Questions