Reputation: 6798
I try to figure out how in Windows 10 with perl script to read an argument coded in cyrillic (cp437) and store it in text file encoded with utf8.
In the console chcp
command returns cp437 code page.
Search on StackOverflow returned several question of similar nature. I've attempted to utilize knowledge obtained from these posts but without success.
An examples demonstrating:
would be greatly appreciated.
NOTE: console input (cp437) to output (cp1251) is purely for demonstration what it involves and how it is done properly.
UPDATE: cp437 does not include Cyrillic symbols, Perl uses ANSI system calls [CreateFileA] and can not pass Cyrillic characters into Perl without additional workaround. Default codepage for my system is cp1252 which does not cover Cyrillic symbols.
Upvotes: 0
Views: 522
Reputation: 385917
The command line can be obtained from the OS using the "ANSI" interface or using the "Wide" interface.
The ANSI interface uses text encoded using the active code page.
The Wide interface uses text encoded using UTF-16le.
Perl uses the ANSI interface (though you could access the Wide interface through Win32:API, for example).
use Encode qw( decode );
use Win32 qw( );
my $acp = "cp".Win32::GetACP();
@ARGV = map { decode($acp, $_) } @ARGV;
open(my $fh, '>:encoding(UTF-8)', $qfn)
or die("Can't create \"$qfn\": $!\n");
print($fh "$_\n") for @ARGV;
It's important to note that the encoding used by the console (as shown by chcp
) is not the same as the active code page. What this means is that @ARGV
can only contains characters that are in both the OEM code page (the encoding used by the console) and the active code page (the encoding used by the ANSI interface).
The remove this limitation, one would use the wide interface of the system call to get the arguments from the command line (GetCommandLineW
) and the wide interface of the system call to parse the command line (CommandLineToArgvW
). This would provide the arguments no matter what encoding the console uses. With code page 65001 being used in the console, this allows any Unicode character to be used in arguments.
This page contains Perl code to make those system calls.
Upvotes: 1