Reputation: 147
Iam trying to upload 1 million text files into HDFS. So, uploading those files using Eclipse is taking around 2 hours. Can anyone please suggest me any fast technique to do this thing.? What Iam thinking of is : To zip all the text files into a single zip and then upload that into HDFS and finally using some unzipping technique , I would extract those files onto HDFS. Any help will be appreciated.
Upvotes: 2
Views: 5553
Reputation: 1
If it is on your device, you can use the following command
hadoop fs -put <local_file_path><hdfs_directory_path>
Upvotes: 0
Reputation: 4575
Distcp is a good way to upload files to HDFS, but for your particular use case (you want to upload local files to a single node cluster running in the same computer) the best thing is not to upload the files to HDFS at all. You can use localfs (file://a_file_in_your_local_disk
) instead of HDFS, so no need to upload the files.
See this other SO question for examples on how to do this.
Upvotes: 2
Reputation: 34184
Try DistCp. DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. You can use it to copy data from your local FS to HDFS as well.
Example : bin/hadoop distcp file:///Users/miqbal1/dir1 hdfs://localhost:9000/
Upvotes: 1