Reputation: 31
I am using python v3.4 on my server and I frequently need to copy/move multiple files from my local directory to hdfs directory. All my files are in sub-directories, which in turn are in MyDir. Here is the command which I use-
$ hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/
This command runs fine on server, but when I use the same command inside python using subprocess
>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'])
It gives the following error-
copyFromLocal: `MyDir/*': No such file or directory
1
P.S.- I also tried ['hadoop', 'fs', '-put'....]
instead of ['hdfs', 'dfs', '-copyFromLocal'....]
, it is also not working.
Can anyone help me on this? Any help would be appreciated.
EDIT- I need to move files along with sub-directories.
Upvotes: 1
Views: 3164
Reputation: 431
Append everything in the command into a single string and give parameter shell = True
subprocess.call('hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/', shell = True)
Upvotes: 1
Reputation: 366
I would write a function with subprocess that gives you output and error:
import subprocess
def run_cmd(args_list):
"""
run linux commands
"""
# import subprocess
print('Running system command: {0}'.format(' '.join(args_list)))
proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
s_output, s_err = proc.communicate()
s_return = proc.returncode
return s_return, s_output, s_err
Then:
import os
for file in os.listdir('your-directory'):
run_cmd(['hadoop', 'fs', '-put', 'your-directory/{0}'.format(file), 'target-directory'])
That should loop through all of the files in your directory and put them in your desired HDFS directory
Upvotes: 1
Reputation: 2791
add shell=True
:
>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'], shell=True)
Read this post: Actual meaning of 'shell=True' in subprocess
Upvotes: 1