surajitk
surajitk

Reputation: 147

Fastest way to upload text files into HDFS(hadoop)

Iam trying to upload 1 million text files into HDFS. So, uploading those files using Eclipse is taking around 2 hours. Can anyone please suggest me any fast technique to do this thing.? What Iam thinking of is : To zip all the text files into a single zip and then upload that into HDFS and finally using some unzipping technique , I would extract those files onto HDFS. Any help will be appreciated.

Upvotes: 2

Views: 5553

Answers (3)

Migbar Abera
Migbar Abera

Reputation: 1

If it is on your device, you can use the following command

hadoop fs -put <local_file_path><hdfs_directory_path>

Upvotes: 0

cabad
cabad

Reputation: 4575

Distcp is a good way to upload files to HDFS, but for your particular use case (you want to upload local files to a single node cluster running in the same computer) the best thing is not to upload the files to HDFS at all. You can use localfs (file://a_file_in_your_local_disk) instead of HDFS, so no need to upload the files.

See this other SO question for examples on how to do this.

Upvotes: 2

Tariq
Tariq

Reputation: 34184

Try DistCp. DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. You can use it to copy data from your local FS to HDFS as well.

Example : bin/hadoop distcp file:///Users/miqbal1/dir1 hdfs://localhost:9000/

Upvotes: 1

Related Questions