Reputation: 445
I have an encoding problem with Perl and Windows. On a Windows 7 running Perl (strawberry 5.16) and a simple TK GUI I need to open files and/or access directories with non-english characters in their name/path. For opening files I've come out with this solution which seems to work fine:
#!/usr/bin/perl -w
use strict;
use warnings;
use Win32::Unicode::File;
use Encode;
use Tk;
my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
-text => 'Open file',
-command => [ \&select_unicode_file ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();
sub select_unicode_file{
my $types = [ ['Txt', '.txt'],
['All Files', '*'],];
my $input_file= $mw->getOpenFile(-filetypes => $types);
my $fh = Win32::Unicode::File->new;
if ($fh->open('<', $input_file)){
while (my $line = $fh->readline()){
print "\n$line\n";
}
close $fh;
}
else{
print "Couldn't open file: $!\n";
}
}
This correctly opens files such as Поиск/Поиск.txt
What I CANNOT do is to simply get a directory path and than process it. I think I should use use Win32::Unicode::Dir but I really can't understand the documentation.
It should be something like this:
#!/usr/bin/perl -w
use strict;
use warnings;
use Win32::Unicode::Dir;
use Encode;
use Tk;
my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
-text => 'Open file',
-command => [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();
sub select_unicode_directory{
my $dir = $mw->chooseDirectory( );
my $wdir = Win32::Unicode::Dir->new;
my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";
open (MYFILE, $dir_complete );
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
}
Upvotes: 9
Views: 3085
Reputation: 7922
There is a logical error in:
my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";
$wdir->open('path')
returns an object, not a string. You can't use it like a path. But that is not the worst of it. Sadly, it seems like the Tk implementation does not yet have support for Unicode file names (including chooseDirectory). I guess you will have to write a custom dir selector, but I'm not sure it's even possible.
This is capable of listing files in an ascii-chars folder (and ->fetch can list utf-8 files), and crashes when opening a folder with utf-8 chars. Well, to be fair it crashes when opening ??????
.
use strict;
use warnings;
use Win32::Unicode::Dir;
use Win32::Unicode::Console;
use Encode;
use Tk;
my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
-text => 'Select dir',
-command => [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();
sub select_unicode_directory {
my $wdir = Win32::Unicode::Dir->new;
my $selected = $mw->chooseDirectory(-parent =>$mw);
# http://search.cpan.org/dist/Tk/pod/chooseDirectory.pod#CAVEATS
$selected = encode("utf-8", $selected);
print "selected: $selected\n";
$wdir->open($selected) || die $wdir->error;
print "\$mw->chooseDirectory: $selected\n";
print "\$wdir->open(\$selected): $wdir\n";
# CRASH HERE, presumably because winders can't handle '?' in a file (dir) name
for ($wdir->fetch) {
# http://search.cpan.org/~xaicron/Win32-Unicode-0.38/lib/Win32/Unicode/Dir.pm
next if /^\.{1,2}$/;
my $path = "$selected/$_";
if (file_type('f', $path)) { print "file: $path\n"; }
elsif (file_type('d', $path)) { print " dir: $path\n"; }
}
print "closing \n";
$wdir->close || die $wdir->error;
}
Both samples below were run using: Strawberry Perl 5.12.3 built for MSWin32-x64-multi-thread
selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????
$mw->chooseDirectory: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????
$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2e38158)
>>> perl crash <<<
selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk
$mw->chooseDirectory: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk
$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2b92c10)
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/.select_uni_dir.pl.swp
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o.dir
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_dir.pl
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_file.pl
dir: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/Поиск
The Tk dir selector returns ????? instead of Поиск. While looking for a way to enable Unicode in Tk, I found this:
http://search.cpan.org/dist/Tk/pod/UserGuide.pod#Perl/Tk_and_Unicode :
(...) Unfortunately, there are still places in Perl ignorant of Unicode. One of these places are filenames. Consequently, the file selectors in Perl/Tk do not handle encoding of filenames properly. Currently they suppose that filenames are in iso-8859-1 encoding, at least on Unix systems. As soon as Perl has a concept of filename encodings, then Perl/Tk will also implement such schemes.
So at first glance it seems what you're trying to do is impossible (unless you write or find a custom dir-selector). Actually, it may not be a bad idea to submit this bug, because the GUI did show "Поиск" so the error is in the return value.
Upvotes: 1