Daniel
Daniel

Reputation: 195

Perl: How to reliably shell_quote filenames with UTF8

I need to check for the existance of some file that may

Because of the whitespaces I use String::ShellQuote. This, however, seems to not work well with the Umlauts when executed on OSX (don't know yet about other OS):

    # vim: ft=perl fenc=utf8
    # perl 5, version 12, subversion 4 (v5.12.4) built for darwin-thread-multi-2level

    use strict;
    use warnings;
    use String::ShellQuote;

    my @files = map {$_, shell_quote($_)} ("AOU.tmp", "ÄÖÜ.tmp", "A OU.tmp", "Ä ÖU.tmp");
    foreach my $file ( @files, ) {
        print "$file:\t";
        `touch $file`;
        print "created, " if( !$? ) ;
        print "EXISTS (says Perl), " if( -e $file );
        `ls -1 $file >/dev/null`;
        print "EXISTS (says ls), " if( !$? );
        print "\n";
    }

Output:

    OU.tmp:     created, EXISTS (says Perl), EXISTS (says ls), 
    AOU.tmp:    created, EXISTS (says Perl), EXISTS (says ls), 
    ÄÖÜ.tmp:    created, EXISTS (says Perl), EXISTS (says ls), 
    'ÄÖÜ.tmp':  created, EXISTS (says ls), 
    A OU.tmp:   created, EXISTS (says Perl), EXISTS (says ls), 
    'A OU.tmp': created, EXISTS (says ls), 
    Ä ÖU.tmp:   created, EXISTS (says Perl), EXISTS (says ls), 
    'Ä ÖU.tmp': created, EXISTS (says ls), 

Question: How can I reliably shell_quote filenames that may contain extended characters?

Side note: I assume this is one of these totally great OS-X typical UTF8 normalization issues (precomposed vs. decomposed encoding of Umlauts). Nevertheless, I think that String::ShellQuote should be able to deal with it.

Upvotes: 1

Views: 438

Answers (1)

Nathaniel Waisbrot
Nathaniel Waisbrot

Reputation: 24493

As far as I can tell, the bugs are all yours.

Let's run through the two loops for A OU.tmp:

First, the unquoted form.

  1. You print A OU.tmp
  2. You run touch A OU.tmp. This creates (or updates) two files A and OU.tmp
  3. Touch ran successfully, so you print "created, "
  4. You check -e "A OU.tmp". There is no such file (I believe you've mis-transcribed your output, because it is not what I get when I paste in your code running perl 5, version 12, subversion 4 (v5.12.4) built for darwin-thread-multi-2level)
  5. You run ls A OU.tmp. This is roughly equivalent to running ls A && ls OU.tmp. Both these files exist, so the command succeeds.
  6. Since it worked, you print "EXISTS (says ls), "

Next time through the loop, Shell_Quote makes $file be equal to 'A OU.tmp'

  1. You print 'A OU.tmp'
  2. You run touch 'A OU.tmp'. This creates (or updates) a single file, named A OU.tmp (because the space was quoted)
  3. Touch ran successfully, so you print "created, "
  4. You check -e "'A OU.tmp'" There is no such file. There is a file named A OU.tmp, but no file named 'A OU.tmp' which is what you're asking Perl to look for. (Perl is not your shell, so if you give Perl shell-quoted things, it's not going to interpret them like the shell.
  5. You run ls 'A OU.tmp'. This checks for a single file with a space in its name, which exists, so the command succeeds.
  6. Since it worked, you print "EXISTS (says ls), "

The central problem seems to be that you're treating Perl like a thin layer over the shell. You should generally choose to work with files either in Perl or in the shell.

In Perl:

# do not use Shell_Quote
foreach my $file ( @files, ) {
    open my $FH, ">>$file" or die;
    close $FH;
    print "yep!" if (-e $file);
}

In shell (via Perl):

# use only Shell_Quote
foreach my $file ( @files, ) {
    `touch $file`;
    print "yes!" if (`ls $file`);
}

Upvotes: 5

Related Questions