Reputation: 58806
What is the easiest to use distributed map reduce programming system?
For example. in a distributed datastore containing many users, each with many connections, say I wanted to count the total number of connections:
Map:
for all records of type "user"
do for each user
count number of connections
retrun connection_count_for_one_user
Reduce:
reduce (connection_count_for_one_user)
total_connections += connection_count_for_one_user
Is there any mapreduce system that lets me program in this way?
Upvotes: 0
Views: 2097
Reputation: 13937
Well i'll take a stab at making some suggestions, but your question isn't too clear.
So how are you storing your data? The storage mechanism is separate to how you apply MapReduce algorithms to the data. I'm going to assume you are using the Hadoop Distributed File System.
The problem you illustrate actually looks very similar to the typical Hadoop MapReduce word count example. Instead of words you are just counting users instead.
Some of the options you have for applying MapReduce to data stored on a HDFS are:
Which is easiest?
Well that all depends on what you feel comfortable with. If know Java take a look at the standard Java framework. If you are used to scripting languages you could use Pig or streaming. If you know SQL you could take a look at using Hive QL to query the HDFS. I would take a look the documentation for each as a starting point.
Upvotes: 2