Reputation: 402
I'm trying to do a join of records emitted from a KafkaSpout with records in an Oracle table (not a streaming join)
What is the best way to implement this?
I can use a cache to retrieve the records from db table and then I can do a join of each tuple emitted from the spout with the cached data.
Would like to get suggestions on this.
Upvotes: 0
Views: 90
Reputation: 62350
The simples way is to open an JDBC connection to the database in open()
or prepare()
(depending if you want to do this in spout or bolt) and query the database for each tuple to be processed to receive the corresponding join tuples.
Of course, you can additionally use a cache (maybe a simple HashMap
) within your spout/bolt code to avoid querying the same data over and over again. For this, I would populate the cache lazily and also limit the number of entries to avoid out-of-memory errors. You might want to implement LRU strategy to dismiss tuples from your cache it case it reaches its limit.
Upvotes: 1