Reputation: 21
I have written a python code to check if file exists in Hadoop file system or not. Python function receives location passed from another function and bash code within checks if location exists.
def check_file_exists_in_hadoop(loc):
yourdir = "/somedirectory/inhadoop/"+loc
cmd = '''
hadoop fs -test -d ${yourdir};
if [ $? -eq 0 ]
then
echo "Directory exists!"
else
echo "Directory does not exists!"
fi
'''
res = subprocess.check_output(cmd, shell=True)
output = (str(res, "utf-8").strip())
print(output)
if output == "Directory exists!":
print("Yay!!!!")
else:
print("Oh no!!!!")
How to pass 'yourdir' variable inside bash portion of code.
Upvotes: 2
Views: 100
Reputation: 16213
All that playing around in shells looks awkward, why not just do:
def check_file_exists_in_hadoop(loc):
path = "/somedirectory/inhadoop/" + loc
res = subprocess.run(["hadoop", "fs", "-test", "-d", path])
return res.returncode == 0
You can execute as:
if check_file_exists_in_hadoop('foo.txt'):
print("Yay!!!!")
else:
print("Oh noes!!!!")
When you execute/run a process/program in a Unix-like system, it receives an array of arguments (exposed as e.g., sys.argv
in Python). you can construct these in various ways but passing them to run
gives you the most direct control. You can of course use a shell to do this, but starting up a shell just to do this seems unnecessary. Given that this argument list is just a list of strings in Python you can use normal list/string manipulations to construct whatever you need.
Using a shell can be useful, but as Gilles says you need to be careful to sanitise/escape your input — not everybody loves little bobby tables!
Upvotes: 3
Reputation: 107889
Pass the string as an argument to the shell. Instead of using shell=True
, which runs ['sh', '-c', cmd]
under the hood, invoke a shell explicitly. After the shell code, the first argument is the shell or script name (which is unused here), then the next argument is available as "$1"
in the shell snippet, the next argument as "$2"
, etc.
cmd = '''
hadoop fs -test -d "$1";
…
'''
res = subprocess.check_output(['sh', '-c', cmd, 'sh', yourdir])
Alternatively, pass the string as an environment variable.
cmd = '''
hadoop fs -test -d "$yourdir";
…
'''
env = os.environ.copy()
env['yourdir'] = yourdir
res = subprocess.check_output(cmd, shell=True, env=env)
In the shell snippet, note the double quotes around $1
or $yourdir
.
Do not interpolate the string into the shell command directly, i.e. don't use things like . That doesn't work if the string contains shell special characters: it's a gaping security hole. For example if 'test -d {}'.format(yourdir)
yourdir
is a; rm -rf ~
then you've just kissed your data goodbye.
Upvotes: 1