Reputation: 2714
I know it sound silly and understand hadoop is not meant for small files but unfortunately i have received 6000+ small files each of around 50kb.
Everytime i try to run "hadoop fs -put -f /path/FOLDER_WITH_FILES /target/HDSF_FOLDER" it always fails for one the random file while making connection with namenode.
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel
I was wondering if there any better approach to write small in HDFS.
Thanks
Upvotes: 0
Views: 266
Reputation: 1539
It is always advisable to merge all your small files into hadoop sequence file, and process it. It will give you performance gain.
Upvotes: 0