apache storm, load balance, json

Question

I am using Kafka storm, kafka sends/emits json string to storm, in the storm, I want to distribute the load to a couple of workers based on the key/field in the json. How to do that? In my case, it is groupid field in json string.

For example, I have json like that:

{groupid: 1234, userid: 145, comments:"I want to distribute all this group 1234  to one worker", size:50,type:"group json"}
{groupid: 1235, userid: 134, comments:"I want to distribute all this group 1234 to another worker", size:90,type:"group json"}
{groupid: 1234, userid: 158, comments:"I want to be sent to same worker as group 1234", size:50,type:"group json"}

I try too use following codes:

      1.  TopologyBuilder builder = new TopologyBuilder();
      2.  builder.setSpout(SPOUTNAME, kafkaSpout, 1);
      3.  builder.setBolt(MYDISTRIBUTEDWORKER, new DistributedBolt()).setFieldsGroup(SPOUTNAME,new Fields("groupid"));  <---???

I am wondering how to put arguments in setFieldsGroup method in line 3. Could someone give me a hint?

Juhani

==Testing using storm 0.9.4 ============

=============source codes==============

import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;

import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;


public class KafkaBoltMain {
   private static final String SPOUTNAME="TopicSpout"; 
   private static final String ANALYSISBOLT = "AnalysisWorker";
   private static final String CLIENTID = "Storm";
   private static final String TOPOLOGYNAME = "LocalTopology";


   private static class AppAnalysisBolt extends BaseRichBolt {
       private static final long serialVersionUID = -6885792881303198646L;
        private OutputCollector _collector;
       private long groupid=-1L;
       private String log="test";

       public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
           _collector = collector;
       }

       public void execute(Tuple tuple) {
           List

Adrian Seungjin Lee · Accepted Answer

I'm not sure which version of Storm you are using, as of 0.9.4, your requirement can be implemented as follows.

builder.setBolt(MYDISTRIBUTEDWORKER, new DistributedBolt()).fieldsGrouping(SPOUTNAME, new Fields("groupid"));

In prepare method of DistributedBolt,

public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("groupid", "log"));
}

Somewhere in execute method of it, you will call

collector.emit(new Values(groupid, log));

then tuples which have same groupid will be delivered to same instance of next bolt.

apache storm, load balance, json

==Testing using storm 0.9.4 ============

Answers (1)

Related Questions