Johnny
Johnny

Reputation: 7323

How do I implement this topology in Storm?

I'm new to Storm, so be gentle :-)

I want to implement a topology that is similar to the RollingTopWords topology in the Storm examples. The idea is to count the frequency of words emitted. Basically, the spouts emit words at random, the first level bolts count the frequency and pass them on. The twist is that I want the bolts to pass on the frequency of a word only if its frequency in one of the bolts exceeded a threshold. So, for example, if the word "Nathan" passed the threshold of 5 occurrences within a time window on one bolt then all bolts would start passing "Nathan"'s frequency onwards.

What I thought of doing is having another layer of bolts which would have the list of words which have passed a threshold. They would then receive the words and frequencies from the previous layer of bolts and pass them on only if they appear in the list. Obviously, this list would have to be synchronized across the whole layer of bolts.

Is this a good idea? What would be the best way of implementing it?

Update: What I'm hoping to achieve a situation where communication is minimized i.e. each node in my use case is simulated by a spout and an attached bolt which does the local counting. I'd like that bolt to emit only words that have passed a threshold, either in the bolt itself or in another one. So every bolt will have to have a list of words that have passed the threshold. There will be a central repository that will hold the list of words over the threshold and will communicate with the bolts to pass that information.

What would be the best way of implementing that?

Upvotes: 0

Views: 176

Answers (1)

filip
filip

Reputation: 414

That shouldn't be too complicated. Just don't emit the words until you reach the threshold and in the meantime keep them stored in a HashMap. That is just one if-else statement.

About the synchronization - I don't think you need it because when you have these kind of problems (with counting words) you want one and only one task to receive a specific word. The one task that receives the word (e.g. "Nathan") will be the only one emitting its frequency. For that you should use fields grouping.

Upvotes: 1

Related Questions