c_risk
c_risk

Reputation: 21

How to pass variable of Python to bash code within same Python Program?

I have written a python code to check if file exists in Hadoop file system or not. Python function receives location passed from another function and bash code within checks if location exists.

    def check_file_exists_in_hadoop(loc):
        yourdir = "/somedirectory/inhadoop/"+loc
        cmd = '''
        hadoop fs -test -d ${yourdir};
        if [ $? -eq 0 ]
        then 
            echo "Directory exists!"
        else
            echo "Directory does not exists!" 
        fi
        '''
        res = subprocess.check_output(cmd, shell=True)
        output = (str(res, "utf-8").strip())
        print(output)
        if output == "Directory exists!":
            print("Yay!!!!")
        else:
            print("Oh no!!!!") 

How to pass 'yourdir' variable inside bash portion of code.

Upvotes: 2

Views: 100

Answers (2)

Sam Mason
Sam Mason

Reputation: 16213

All that playing around in shells looks awkward, why not just do:

def check_file_exists_in_hadoop(loc):
    path = "/somedirectory/inhadoop/" + loc
    res = subprocess.run(["hadoop", "fs", "-test", "-d", path])
    return res.returncode == 0

You can execute as:

if check_file_exists_in_hadoop('foo.txt'):
    print("Yay!!!!")
else:
    print("Oh noes!!!!")

When you execute/run a process/program in a Unix-like system, it receives an array of arguments (exposed as e.g., sys.argv in Python). you can construct these in various ways but passing them to run gives you the most direct control. You can of course use a shell to do this, but starting up a shell just to do this seems unnecessary. Given that this argument list is just a list of strings in Python you can use normal list/string manipulations to construct whatever you need.

Using a shell can be useful, but as Gilles says you need to be careful to sanitise/escape your input — not everybody loves little bobby tables!

Upvotes: 3

Pass the string as an argument to the shell. Instead of using shell=True, which runs ['sh', '-c', cmd] under the hood, invoke a shell explicitly. After the shell code, the first argument is the shell or script name (which is unused here), then the next argument is available as "$1" in the shell snippet, the next argument as "$2", etc.

cmd = '''
    hadoop fs -test -d "$1";
…
'''
res = subprocess.check_output(['sh', '-c', cmd, 'sh', yourdir])

Alternatively, pass the string as an environment variable.

cmd = '''
    hadoop fs -test -d "$yourdir";
…
'''
env = os.environ.copy()
env['yourdir'] = yourdir
res = subprocess.check_output(cmd, shell=True, env=env)

In the shell snippet, note the double quotes around $1 or $yourdir.

Do not interpolate the string into the shell command directly, i.e. don't use things like 'test -d {}'.format(yourdir). That doesn't work if the string contains shell special characters: it's a gaping security hole. For example if yourdir is a; rm -rf ~ then you've just kissed your data goodbye.

Upvotes: 1

Related Questions