eduardo
eduardo

Reputation: 123

Curl download to HDFS

I had this code:

curl -o fileName.csv url | xargs hdfs dfs -moveFromLocal $1 /somePath/

When i execute this code, curl put the values from request inside fileName.csv, the file are moved to HDFS. I wanna know if i can, mantain the curl output in memory, send to pipe and just write the values inside HDFS?

Something like this(that works):

curl url | xargs hdfs dfs -put $1 /somePath

Upvotes: 4

Views: 2643

Answers (1)

Chris Nauroth
Chris Nauroth

Reputation: 9844

The hdfs dfs -put command can accept file input from stdin, using the familiar idiom of specifying - to mean stdin:

> curl -sS https://www.google.com/robots.txt | hdfs dfs -put - /robots.txt
> hdfs dfs -ls /robots.txt
-rw-r--r--   3 cnauroth supergroup       6880 2017-07-06 09:07 /robots.txt

Another option is to use shell process substitution to allow treating the stdout of curl (or really any command you choose) as if it were a file input to another command:

> hdfs dfs -put <(curl -sS https://www.google.com/robots.txt) /robots.txt
> hdfs dfs -ls /robots.txt
-rw-r--r--   3 cnauroth supergroup       6880 2017-07-05 15:07 /robots.txt

Upvotes: 5

Related Questions