Reputation: 11
I'm copying files from remote location using lftp using mget parameter to my local machine. Then I'm using hdfs dfs -cp localfolder/localfile to hdfsLocation
. I'd like to be able to copy those files to HDFS without having to store those on my local machine.
I've tried the code below but I'd like to bypass the copy through my local machine. I've also tried this
subprocess.Popen("""lftp sftp://login:password@adressLocal -e "lcd hdfs://serverHDFS:8020/projects/folder/child/tmp/;mget /var/projects/stockage/folder/child/.success/"""+fileName.ext+""";bye " """,
shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
but it's not working
Thanks for your help
import os
import subprocess
s=subprocess.Popen("""lftp sftp://login:password!@adress-e "lcd /projects/folder/child/tmp/;mget /var/projects/stockage/folder/child/.success/"""+fileName.ext+""";bye " """,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
cmd = 'hdfs dfs -put /var/projects/folder/file.ext hdfs://server:8020/projects/folder/tmp/'
subprocess.call(cmd, shell=True)
Upvotes: 1
Views: 805
Reputation: 191973
I suggest you install Apache Nifi, StreamSets, or KNIME which allow you to graphically transfer FTP contents to HDFS (and other more advanced ETL workloads)
StreamSets or KNIME will generate Spark code for you behind the scenes
Upvotes: 2