Nitharjan KANAGARAJAH
Nitharjan KANAGARAJAH

Reputation: 11

Copy files from SFTP server to HDFS using Python

I'm copying files from remote location using lftp using mget parameter to my local machine. Then I'm using hdfs dfs -cp localfolder/localfile to hdfsLocation. I'd like to be able to copy those files to HDFS without having to store those on my local machine.

I've tried the code below but I'd like to bypass the copy through my local machine. I've also tried this

subprocess.Popen("""lftp sftp://login:password@adressLocal -e "lcd hdfs://serverHDFS:8020/projects/folder/child/tmp/;mget /var/projects/stockage/folder/child/.success/"""+fileName.ext+""";bye " """,
                 shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)

but it's not working

Thanks for your help

import os
import subprocess
s=subprocess.Popen("""lftp sftp://login:password!@adress-e "lcd /projects/folder/child/tmp/;mget /var/projects/stockage/folder/child/.success/"""+fileName.ext+""";bye " """,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
cmd = 'hdfs dfs -put /var/projects/folder/file.ext hdfs://server:8020/projects/folder/tmp/'
subprocess.call(cmd, shell=True)

Upvotes: 1

Views: 805

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191973

I suggest you install Apache Nifi, StreamSets, or KNIME which allow you to graphically transfer FTP contents to HDFS (and other more advanced ETL workloads)

StreamSets or KNIME will generate Spark code for you behind the scenes

Upvotes: 2

Related Questions