aidan
aidan

Reputation: 9576

How can I safely pass a filename with spaces to an external command in Perl?

I have a Perl script that processes a bunch of file names, and uses those file names inside backticks. But the file names contain spaces, apostrophes and other funky characters.

I want to be able to escape them properly (i.e. not using a random regex off the top of my head). Is there a CPAN module that correctly escapes strings for use in bash commands? I know I've solved this problem in the past, but I can't find anything on it this time. There seems to be surprisingly little information on it.

Upvotes: 7

Views: 9022

Answers (3)

mklement0
mklement0

Reputation: 437278

tl;dr

The following subroutine safely quotes (escapes) a list of filenames (paths) on both Unix-like and Windows systems:

#!/usr/bin/env perl

sub quoteforshell { 
  return join ' ', map { 
    $^O eq 'MSWin32' ?
      '"' . s/"/""/gr . '"'
      : 
      "'" . s/'/'\\''/gr . "'" 
  } @_;
}

#'# Sample invocation
my $shellcmd = ($^O eq 'MSWin32' ? 'echo ' : 'printf "%s\n" ') . 
  quoteforshell('\\foo/bar', 'I\'m here', '3" of snow', 'bar |&;()<>#!');

print `$shellcmd`;

Output of the sample command on Unix-like systems, showing that all input arguments were passed through unmodified:

\foo/bar
I'm here
3" of snow
bar |&;()<>#!
  • On Unix-like systems, it should work with any strings (except ones with embedded NUL chars), not just filenames - see below for details.

  • On Windows, embedded " instances are escaped as "", which is the only safe way to do it, but, sadly, may not be what the target program expects - see below for details; note, however, that this is not a concern if you're only passing filenames on Windows, because " is not a legal filename character.

  • See the bottom of this post for a shell-less command-invocation alternative that bypasses the "-quoting problem on Windows.


On Unix-like platforms, qx// (the generalized form of `...`) and the single-argument forms of system and exec invoke the shell by passing the command to /bin/sh -c. /bin/sh is assumed to be POSIX-compatible (and may or may not be Bash on a given system).

The single-argument forms of system and exec may or may not involve a shell - they decide based on the specific command passed whether involvement of a shell is needed. For instance, if a command has embedded (literal) single- or double-quotes, the shell is called. Since the solution below is based on embedding single-quoted tokens in the command string, it also works with the single-argument form of system and exec.

In POSIX-compatible shells you can take advantage of single-quoted strings, which do not interpolate their contents in any way.

The only challenge is to escape single-quotes (') themselves, which requires trickery, because, strictly speaking, embedding single-quotes in a single-quoted strings is not supported by the shell.

The trick is to replace every ' instance with '\'' (sic), which works around the problem by effectively splitting the input string into multiple single-quoted strings, with escaped ' instances - \' - spliced in - the shell then reassembles the string parts into a single string.

Here's a subroutine that take a list of strings (filenames) and returns a space-separated string of quoted versions of the strings that guarantee literal use by the shell:

sub quoteforsh { join ' ', map { "'" . s/'/'\\''/gr . "'" } @_ }

Example (uses most POSIX shell metacharacters):

my $shellcmd = 'printf "%s\n" ' . 
                  quoteforsh('\\foo/bar', 'I\'m here', '3" of snow', 'bar |&;()<>#!');
print `$shellcmd`;

This passes the following to /bin/sh -c (shown here as a pure literal, without any quoting):

 printf "%s\n" '\foo/bar' 'I'\''m here' '3" of snow' 'bar |&;()<>#!'

Note how each input string is in enclosed in single-quotes, and how the only character that needed quoting among all input strings was ', which, as discussed, was replaced with '\''.

This should output the input strings as-is, one on each line:

\foo/bar
I'm here
3" of snow
bar |&;()<>#!

On Windows, the analogous subroutine looks like this:

sub quoteforcmdexe { join ' ', map { '"' . s/"/""/gr . '"' } @_ }

This works analogous to quoteforsh() above, except that

  • double-quotes are used to enclose the tokens, because cmd.exe doesn't support single-quoting.
  • the only character that needs escaping is ", which is escaped as "" - note, however, that for filenames this isn't strictly necessary, because Windows doesn't allow " instances in filenames.

However, there are limitations and pitfalls:

  • You cannot suppress interpretation of references to existing environment variables, such as %USERNAME%; by contrast, non-existing variables or isolated % instances are fine.
    • Note: You should be able to escape % instances as %%, but while that works in a batch file, it inexplicably doesn't work from Perl:
      • `perl "%%USERNAME%%.pl"` complains, e.g., about %jdoe%.pl not being found, implying that %USERNAME% was interpolated, despite the doubled % chars.
      • (On the flip side, isolated % instances in double-quoted strings don't need escaping the way they do in batch files.)
  • Escaping embedded " instances as "" is the only SAFE way to do it, but it is not what most target programs expect.
    • On Windows, incredibly, the required escaping is ultimately up to the target program - for full background, see https://stackoverflow.com/a/31413730/45375
    • In short, the quandary is:
      • If you escape for the target program - and most, including Perl, expect \" - then part of the argument list may never be passed to the target program, with the remaining part either causing failure, unwanted redirection to a file, or, worse, unexpected execution of arbitrary commands.
      • If you escape for cmd.exe, you may break the target program's parsing.
      • You cannot escape for both.
      • You can work around the problem if your command doesn't need involving the shell at all - see below.

Alternative: shell-less command invocation

If your command is an invocation of a single executable with all arguments to be passed as-is, there's no need to involve the shell at all, which:

  • doesn't require quoting of the arguments, which notably bypasses the "-quoting problem on Windows
  • is generally more efficient

The following subroutine works on both Unix-like systems and Windows, and is a shell-less alternative to qx// (`...`), which accepts the command to invoke as a list of arguments to interpret as-is:

sub qxnoshell {
  use IPC::Cmd;
  return unless @_;
  my @cmdargs = @_;
  if ($^O eq 'MSWin32') { # Windows
    # Ensure that the executable name ends in '.exe'
    $cmdargs[0] .= '.exe' unless $cmdargs[0] =~ m/\.exe$/i;
    unless (IPC::Cmd::can_run $cmdargs[0]) { # executable not found
      # Issue warning, as qx// would and open '-|' below does.
      my $warnmsg = "Executable '$cmdargs[0]' not found";
      scalar(caller) eq 'main' ? warn($warnmsg . "\n") : warnings::warnif('exec', $warnmsg);
      return; 
    }
    for (@cmdargs[1..$#cmdargs]) {
      if (m'"') {
        s/"/\\"/; # \-escape embedded double-quotes
        $_ = '"' . $_ . '"'; # enclose as a whole in embedded double-quotes
      }
    }
  }
  open my $fh, '-|', @cmdargs or return;
  my @lines = <$fh>;
  close $fh;
  return wantarray ? @lines : join('', @lines);
}

Examples

# Unix: $out should receive literal '$$', which demonstrates that
# /bin/sh is not involved.
my $out = qxnoshell 'printf', '%s', '$$' 

# Windows: $out should receive literal '%USERNAME%', which demonstrates
# that cmd.exe is not involved.
my $out = qxnoshell 'perl', '-e', 'print "%USERNAME%"' 
  • Requires Perl v5.9.5+ due to use of IPC::Cmd.
  • Note that the subroutines works hard to make things work on Windows:
    • Even though the arguments are passed as a list, open ..., '-|' on Windows still falls back on cmd.exe if the initial invocation attempt fails - the same applies to system() and exec(), incidentally.
    • Thus, in order to prevent this fallback to cmd.exe - which can have unintended consequences - the subroutine (a) ensures that the first list argument is an *.exe executable, (b) tries to locate it, and (c) only tries to invoke the command if the executable could be located.
    • On Windows, sadly, any argument that contains embedded double-quotes is not passed through correctly to the target program - it needs escaping by (a) adding embedded double-quotes to enclose that argument, and (b) by escaping the original embedded double-quotes as \".

Upvotes: 2

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118118

Are you looking for quotemeta?

Returns the value of EXPR with all non-"word" characters backslashed.

Update: As hobbs points out in the comments, quotemeta is not intended for this purpose and upon thinking a little more about it, might have problems with embedded nuls. On the other hand String::ShellQuote croaks upon encountering embedded nulls.

The safest way is to avoid the shell entirely. Using the list form of 'system' can go a long way towards that (I found out to my dismay a few months ago that cmd.exe might still get involved on Windows), I would recommend that.

If you need the output of the command, you are best off (safety-wise) opening a pipe yourself as shown in hobbs' answer

Upvotes: 3

hobbs
hobbs

Reputation: 239801

If you can manage it (i.e. if you're invoking some command directly, without any shell scripting or advanced redirection shenanigans), the safest thing to do is to avoid passing data through the shell entirely.

In perl 5.8+:

my @output_lines = do {
    open my $fh, "-|", $command, @args or die "Failed spawning $command: $!";
    <$fh>;
};

If it's necessary to support 5.6:

my @output_lines = do {
    my $pid = open my $fh, "-|";
    die "Couldn't fork: $!" unless defined $pid;
    if (!$pid) {
        exec $command, @args or die "Eek, exec failed: $!";
    } else {
        <$fh>; # This is the value of the C<do>
    }
};

See perldoc perlipc for more information on this kind of business, and see also IPC::Open2 and IPC::Open3.

Upvotes: 6

Related Questions