Ankit Seth
Ankit Seth

Reputation: 31

How to move multiple files at once from local-server to HDFS inside python?

I am using python v3.4 on my server and I frequently need to copy/move multiple files from my local directory to hdfs directory. All my files are in sub-directories, which in turn are in MyDir. Here is the command which I use-

$ hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/

This command runs fine on server, but when I use the same command inside python using subprocess

>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'])

It gives the following error-

copyFromLocal: `MyDir/*': No such file or directory
1

P.S.- I also tried ['hadoop', 'fs', '-put'....] instead of ['hdfs', 'dfs', '-copyFromLocal'....], it is also not working.

Can anyone help me on this? Any help would be appreciated.

EDIT- I need to move files along with sub-directories.

Upvotes: 1

Views: 3164

Answers (3)

Harsha Reddy
Harsha Reddy

Reputation: 431

Append everything in the command into a single string and give parameter shell = True

subprocess.call('hdfs dfs -copyFromLocal MyDir/* /path/to/hdfs/', shell = True)

Upvotes: 1

Trevor McCormick
Trevor McCormick

Reputation: 366

I would write a function with subprocess that gives you output and error:

import subprocess
def run_cmd(args_list):
    """
    run linux commands
    """
    # import subprocess
    print('Running system command: {0}'.format(' '.join(args_list)))
    proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    s_output, s_err = proc.communicate()
    s_return =  proc.returncode
    return s_return, s_output, s_err

Then:

 import os
 for file in os.listdir('your-directory'):
     run_cmd(['hadoop', 'fs', '-put', 'your-directory/{0}'.format(file), 'target-directory'])

That should loop through all of the files in your directory and put them in your desired HDFS directory

Upvotes: 1

RaminNietzsche
RaminNietzsche

Reputation: 2791

add shell=True:

>>> subprocess.call(['hdfs', 'dfs', '-copyFromLocal', 'MyDir/*', '/path/to/hdfs/'], shell=True)

Read this post: Actual meaning of 'shell=True' in subprocess

Upvotes: 1

Related Questions