EEsparaquia
EEsparaquia

Reputation: 200

Get the two first files from HDFS

Is there a way to get the two first files from HDFS using command line?. My hadoop version is 2.7.3

I have a folder in HDFS with multiple files, that another application is puting the there: /user/Lab01/inpu/ingestionFile1.json /user/Lab01/inpu/ingestionFile2.json /user/Lab01/inpu/ingestionFile3.json /user/Lab01/inpu/ingestionFile4.json

I need to work just with the first two files based on time, so if list the content using:

 $ hdfs dfs -ls -R /user/Lab01/input

-rw-------   3 huser dev       668 2019-02-13 11:34 /user/Lab01/inpu/ingestionFile1.json
-rw-------   3 huser dev        668 2019-02-13 11:36 /user/Lab01/inpu/ingestionFile2.json
-rw-------   3 huser dev        668 2019-02-13 11:38 /user/Lab01/inpu/ingestionFile3.json
-rw-------   3 huser dev        668 2019-02-13 11:41 /user/Lab01/inpu/ingestionFile4.json

In order to get the two first files from the directory I simple pip the command using head -2 to get:

$ hdfs dfs -ls -R /user/Lab01/input | head -2

-rw-------   3 huser dev       668 2019-02-13 11:34 /user/Lab01/inpu/ingestionFile1.json
-rw-------   3 huser dev        668 2019-02-13 11:36 /user/Lab01/inpu/ingestionFile2.json

The normal command to get files from hdfs is using -get:

  hdfs dfs -get /user/Lab01/input/fileName

So thats why right now I'm trying to merge this two commands:

$ hdfs dfs -get /user/Lab01/input | hdfs dfs -ls -R /user/Lab01/input | head -2 

But I don't get the desire result, I just get a message giving me the output from the last command (hdfs dfs -ls -R /user/Lab01/input | head -2) :

-rw-------   3 huser dev       668 2019-02-13 11:34 /user/Lab01/inpu/ingestionFile1.json
-rw-------   3 huser dev        668 2019-02-13 11:36 /user/Lab01/inpu/ingestionFile2.json

Upvotes: 0

Views: 1531

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191874

You can't pipe a -get to an -ls

You need to first -ls | head -2, then awk and cut out the filenames you are listed in, and then individually -get those two.

Something like this should get the names only

hdfs dfs -ls -R /user/Lab01/input | head -2 | awk '{print $8}'

Also - How to list only the file names in HDFS

Then add just "| xargs hdfs dfs -get" to download the files

Upvotes: 2

Related Questions