Grismar
Grismar

Reputation: 31354

Taming shlex.split() behaviour

There are other questions on SO that get close to answering mine, but I have a very specific use case that I have trouble solving. Consider this:

from asyncio import create_subprocess_exec, run


async def main():
    command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
    proc = await create_subprocess_exec(*command)
    await proc.wait()


run(main())

This causes trouble, because program.exe is called with these arguments:

['C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']

That is, the double backslash is no longer there, as shlex.split() removes it. Of course, I could instead (as other answers suggest) do this:

    proc = await create_subprocess_exec(*command, posix=False)

But then program.exe is effectively called with these arguments:

['"C:\\some folder"', '-o"\\\\server\\share\\some folder"', '"a \\"', 'quote\\""']

That's also no good, because now the double quotes have become part of the content of the first parameter, where they don't belong, even though the second parameter is now fine. The third parameters has become a complete mess.

Replacing backslashes with forward slashes, or removing quotes with regular expressions all don't work for similar reasons.

Is there some way to get shlex.split() to leave double backslashes before server names alone? Or just at all? Why does it remove them in the first place?

Note that, by themselves these are perfectly valid commands (on Windows and Linux respectively anyway):

program.exe "C:\some folder" -o"\\server\share\some folder"
echo "hello \"world""

And even if I did detect the OS and used posix=True/False accordingly, I'd still be stuck with the double quotes included in the second argument, which they shouldn't be.

Upvotes: 3

Views: 2015

Answers (3)

Grismar
Grismar

Reputation: 31354

For now, I ended up with this (arguably a bit of a hack):

from os import name as os_name
from shlex import split


def arg_split(args, platform=os_name):
    """
    Like calling shlex.split, but sets `posix=` according to platform 
    and unquotes previously quoted arguments on Windows
    :param args: a command line string consisting of a command with arguments, 
                 e.g. r'dir "C:\Program Files"'  
    :param platform: a value like os.name would return, e.g. 'nt'
    :return: a list of arguments like shlex.split(args) would have returned
    """
    return [a[1:-1].replace('""', '"') if a[0] == a[-1] == '"' else a
            for a in (split(args, posix=False) if platform == 'nt' else split(args))]

Using this instead of shlex.split() gets me what I need, while not breaking UNC paths. However, I'm sure there's some edge cases where correct escaping of double quotes isn't correctly handled, but it has worked for all my test cases and seems to be working for all practical cases so far. Use at your own risk.

@balmy made the excellent observation that most people should probably just use:

command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_shell(command)

Instead of

command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)

However, note that this means:

  • it's not easy to check or replace individual arguments
  • you have the problem that always comes with using create_subprocess_exec if part of your command is based on external input, someone can inject code; in the words of the documentation (https://docs.python.org/3/library/asyncio-subprocess.html):

It is the application’s responsibility to ensure that all whitespace and special characters are quoted appropriately to avoid shell injection vulnerabilities. The shlex.quote() function can be used to properly escape whitespace and special shell characters in strings that are going to be used to construct shell commands.

And that's still a problem, as quote() also doesn't work correctly for Windows (by design).

I'll leave the question open for a bit, in case someone wishes to point out why the above is a really bad idea, or if someone has a better one.

Upvotes: 1

For the -o parameter, but the leading " at the start of it not in the middle, and double the backslashes

Then use posix=True

import shlex

command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'

print( "Original command Posix=True", shlex.split(command, posix=True) )

command = r'program.exe "C:\some folder" "-o\\\\server\\share\\some folder" "a \"quote\""'

print( "Updated command Posix=True", shlex.split(command, posix=True) )

result:

Original command Posix=True ['program.exe', 'C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
Updated command Posix=True ['program.exe', 'C:\\some folder', '-o\\\\server\\share\\some folder', 'a "quote"']

The backslashes are still double in the result, but that's standard Python representation of a \ in a string.

Upvotes: 0

Ture Pålsson
Ture Pålsson

Reputation: 6786

As far as I can tell, the shlex module is the wrong tool if you are dealing with the Windows shell.

The first paragraph of the docs says (my italics):

The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell.

Admittedly, that talks about just one class, not the entire module. Later, the docs for the quote function say (boldface in the original, this time):

Warning The shlex module is only designed for Unix shells.

To be honest, I'm not sure what the non-Posix mode is supposed to be compatible with. It could be, but this is just me guessing, that the original versions of shlex parsed a syntax of its own which was not quite compatible with anything else, and then Posix mode got added to actually be compatible with Posix shells. This mailing list thread, including this mail from ESR seems to support this.

Upvotes: 0

Related Questions