Reputation: 5087
Recently on a webinar by Couchbase, they said that Hadoop be used for processing large log file and Couchbase for presenting it to the application layer. They claimed that the map and reduce of Couchbase and Hadoop was different and suitable for the respective use case mentioned. I was going to use Couchbase map reduce for processing large amouont of log file. Can some one please clarify the exact difference between the two map reduce? Are there any features in Hadoop which makes it more suitable for processing large log files?
Thanks...
Upvotes: 1
Views: 1553
Reputation: 3962
the main difference in the fact that couchbase uses incremental map/reduce and won't scan all the data set one you need to update or remove the items. another difference is the magnitude of "large". if you need to process hundreds of gigabytes of logs once then the couchbase isn't.the best choice.
Upvotes: 3
Reputation: 30089
Couchbase is one of many NoSQL data storage applications. Data is stored in Key / Value pairs, with the keys indexed for quick retrieval.
Conversely data in hadoop is not indexed (other than the file name), and pulling a specific value from a file in HDFS is much slower, possibly involving scanning of many files.
You would typically use something like Hadoop mapreduce to process large files, and update / populate a NoSQL store (such as Couchbase).
Using a NoSQL datastore for processing large amounts of data will most probably be less efficient than using MapReduce to do the same job. But the NoSQL datastore will be able to service a web layer considerably more efficiently than a MapReduce job (which can take 10's of seconds to initialize, and minutes / hours to run).
Upvotes: 2