Reputation: 43
I have a table called Log
which every single row represent the single activity and have a table structure like this
info:date, info:ip_address, info:action, info:info
The example of data is like this
Column Family : info
date | ip_address | action | info
3 March 2014 | 191.2.2.2 | delete | blabla
4 March 2014 | 191.2.2.3 | view | blabla
5 March 2014 | 191.2.2.4 | create | blabla
3 March 2014 | 191.2.2.5 | delete | blabla
4 March 2014 | 191.2.2.5 | create | blabla
4 March 2014 | 191.2.2.6 | delete | blabla
What i want to do is to calculate the average of total of activity based on time. The first things to do is compute the total activity based on time:
time | total_activity
3 March 2014 | 2
4 March 2014 | 3
5 March 2014 | 1
Then, i want to calculate the average of that total_activity which the output will be represent like this
(2 + 3 + 1) / 3 = 2
How i can do this in HBase using MapReduce? I am already thinking that only using one reducer just capable to compute the total of activity based on time.
Thanks
Upvotes: 1
Views: 640
Reputation: 31515
Suggest you look into Scalding - it's the easiest and fastest way to write production Hadoop jobs that can tie in easily to HBase and stuff. Here is a project to help with HBase & Scalding https://github.com/ParallelAI/SpyGlass/blob/master/src/main/scala/parallelai/spyglass/hbase/example/SimpleHBaseSourceExample.scala
Then have a look at the Scalding API to work out how to do what you want: https://github.com/twitter/scalding/wiki/Fields-based-API-Reference
Upvotes: 1