Reputation: 1
I have about 20 millions files stored on my local file system, each file 5k represents a tweet.
This stored as the following:
/home/username/tweets/$tag/$year/$month/$day/$tweetid.txt
Example1 : /home/username/tweets/SCP/2014/04/11/9989443342233.txt
Example1 : /home/username/tweets/WDR/2014/02/08/5890321764568.txt
So is it possible to write a map reduce java program to move all tweets under a certain tag to a singe directory in HDFS based on the tag.
Any similar examples?
Upvotes: 0
Views: 107
Reputation: 6186
As seen in https://blog.cloudera.com/blog/2009/02/the-small-files-problem/,
Make the sequencefile first, then upload it to HDFS.
Upvotes: 1