hadyan
hadyan

Reputation: 43

Calculate Average Count Using MapReduce in HBase

I have a table called Log which every single row represent the single activity and have a table structure like this

info:date, info:ip_address, info:action, info:info

The example of data is like this

Column Family : info
date | ip_address | action | info
3 March 2014 | 191.2.2.2 | delete | blabla
4 March 2014 | 191.2.2.3 | view | blabla
5 March 2014 | 191.2.2.4 | create | blabla
3 March 2014 | 191.2.2.5 | delete | blabla
4 March 2014 | 191.2.2.5 | create | blabla
4 March 2014 | 191.2.2.6 | delete | blabla

What i want to do is to calculate the average of total of activity based on time. The first things to do is compute the total activity based on time:

time | total_activity
3 March 2014 | 2
4 March 2014 | 3
5 March 2014 | 1

Then, i want to calculate the average of that total_activity which the output will be represent like this

(2 + 3 + 1) / 3 = 2

How i can do this in HBase using MapReduce? I am already thinking that only using one reducer just capable to compute the total of activity based on time.

Thanks

Upvotes: 1

Views: 640

Answers (1)

samthebest
samthebest

Reputation: 31515

Suggest you look into Scalding - it's the easiest and fastest way to write production Hadoop jobs that can tie in easily to HBase and stuff. Here is a project to help with HBase & Scalding https://github.com/ParallelAI/SpyGlass/blob/master/src/main/scala/parallelai/spyglass/hbase/example/SimpleHBaseSourceExample.scala

Then have a look at the Scalding API to work out how to do what you want: https://github.com/twitter/scalding/wiki/Fields-based-API-Reference

Upvotes: 1

Related Questions