user1386101
user1386101

Reputation: 1944

Download a file from HDFS cluster

I am developing an API for using hdfs as a distributed file storage. I have made a REST api for allowing a server to mkdir, ls, create and delete a file in the HDFS cluster using Webhdfs. But since Webhdfs does not support downloading a file, are there any solutions for achieving this. I mean I have a server who runs my REST api and communicates with the cluster. I know the OPEN operation just supports reading a text file content, but suppose I have a file which is 300 MB in size, how can I download it from the hdfs cluster. Do you guys have any possible solutions.? I was thinking of directly pinging the datanodes for a file, but this solution is flawed as if the file is 300 MB in size, it will put a huge load on my proxy server, so is there a streaming API to achieve this.

Upvotes: 2

Views: 2976

Answers (2)

Tariq
Tariq

Reputation: 34184

As an alternative you could make use of streamFile provided by DataNode API.

wget http://$datanode:50075/streamFile/demofile.txt

It'll not read the file as a whole, so the burden will be low, IMHO. I have tried it, but on a pseudo setup and it works fine. You can just give it a try on your fully distributed setup and see if it helps.

Upvotes: 2

abhinav
abhinav

Reputation: 1282

One way which comes to my mind, is to use a proxy worker, which reads the file using hadoop file system API, and creates a local normal file.And the provide download link to this file. Downside being

  1. Scalablity of Proxy server
  2. Files may be theoretically too large to fit into disk of a single proxy server.

Upvotes: 0

Related Questions