Amrin Mushtaq
Amrin Mushtaq

Reputation: 11

Printing unicode charcter in windows

I am trying to create file with Unicode character 662f on windows (via Perl or python, anything is fine for me ) . on Linux I am able to get chr 是 , but on windows I am getting this character 是 , and some how I am not able to get that file name as 是.

Python code -

 import sys
 name = unichr(0x662f)
 print(name.encode('utf8').decode(sys.stdout.encoding))

perl code -

my $name .= chr(230).chr(152).chr(175); ##662f
print 'file name ::'. "$name"."txt";

Upvotes: 0

Views: 208

Answers (1)

Silvar
Silvar

Reputation: 705

File manipulation in Perl on Windows (Unicode characters in file name)

In Perl on Windows, I use Win32::Unicode, Win32::Unicode::File and Win32::Unicode::Dir. They work perfectly with Unicode characters in file names.

Just mind that Win32::Unicode::File::open() (and new()) have a reversed argument order compared Perl's built-in open() - mode comes first.

You do not need to encode the characters manually - just insert them as they are (if your Perl script is in UTF-8), or using the \x{N} notation.


Printing out Unicode characters on Windows

Printing Unicode into console on Windows is another problem. You can't use cmd.exe. Instead use PowerShell ISE. The drawback of the ISE is that it's not a console - scripts can't take input from keyboard thru STDIN.

To get Unicode output, you need to do set the output encoding to UTF-8 in every PowerShell ISE that's started. I suggest doing so in the startup script.

Procedure to have PowerShell ISE default to Unicode output:

1) In order for any user PowerShell scripts to be allowed to run, you first need to do:

Set-ExecutionPolicy RemoteSigned

2) Edit or create your Documents\WindowsPowerShell\Microsoft.PowerShellISE_profile.ps1 to something like:

perl -w -e "print qq!Initializing the console with Perl...\n!;"
[System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8;

The short Perl command is there as a trick to allow the System.Console property be modified. Without it, you get an error when setting the OutputEncoding.

If I recall correctly, you also have to change the font to Consolas.

Even when the Unicode characters print out fine, you may have trouble including them in command line arguments. In these cases I've found the \x{N} notation works. The Windows Character Map utility is your friend here.


(Edited heavily after I rediscovered the regular PowerShell's inability to display most Unicode characters, with references to PowerShell (non-ISE) removed. Now I remember why I started using the ISE...)

Upvotes: 1

Related Questions