Reputation: 2167
Was trying to figure out if joins can be achieved with apache NiFi or Streamsets. So that i can read from HBase periodically, join with other tables and write few fields into a Hive table.
Or is there any other workflow manager tool that supports this operation?
Upvotes: 0
Views: 515
Reputation: 12083
I'm not familiar with Streamsets but I will try to help with NiFi. Is your flat file static? If so, are you looking to do a straight replace of values? You should be able to use the ReplaceTextWithMapping processor for that. If not a straight replace, you could pre-populate a DistributedMapCache with the values from the flat file, then use FetchDistributedMapCache to do a lookup against the HBase record(s).
If all else fails, then if you are comfortable with a scripting language such as Groovy, Javascript, or Jython, you could write the "join" part using ExecuteScript or InvokeScriptedProcessor.
There is an open Jira case (with some good progress made) on a lookup/enrichment processor that supports CSV files, properties files, and in-memory lookups.
Upvotes: 3