Zombie
Zombie

Reputation: 402

Joining tuples with an RDBMS table using Apache storm

I'm trying to do a join of records emitted from a KafkaSpout with records in an Oracle table (not a streaming join)

What is the best way to implement this?

I can use a cache to retrieve the records from db table and then I can do a join of each tuple emitted from the spout with the cached data.

Would like to get suggestions on this.

Upvotes: 0

Views: 90

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62350

The simples way is to open an JDBC connection to the database in open() or prepare() (depending if you want to do this in spout or bolt) and query the database for each tuple to be processed to receive the corresponding join tuples.

Of course, you can additionally use a cache (maybe a simple HashMap) within your spout/bolt code to avoid querying the same data over and over again. For this, I would populate the cache lazily and also limit the number of entries to avoid out-of-memory errors. You might want to implement LRU strategy to dismiss tuples from your cache it case it reaches its limit.

Upvotes: 1

Related Questions