Srihari Karanth
Srihari Karanth

Reputation: 2167

NiFi or Streamsets to read from HBase , join with content from flat file and write to Hive

Was trying to figure out if joins can be achieved with apache NiFi or Streamsets. So that i can read from HBase periodically, join with other tables and write few fields into a Hive table.

Or is there any other workflow manager tool that supports this operation?

Upvotes: 0

Views: 515

Answers (1)

mattyb
mattyb

Reputation: 12083

I'm not familiar with Streamsets but I will try to help with NiFi. Is your flat file static? If so, are you looking to do a straight replace of values? You should be able to use the ReplaceTextWithMapping processor for that. If not a straight replace, you could pre-populate a DistributedMapCache with the values from the flat file, then use FetchDistributedMapCache to do a lookup against the HBase record(s).

If all else fails, then if you are comfortable with a scripting language such as Groovy, Javascript, or Jython, you could write the "join" part using ExecuteScript or InvokeScriptedProcessor.

There is an open Jira case (with some good progress made) on a lookup/enrichment processor that supports CSV files, properties files, and in-memory lookups.

Upvotes: 3

Related Questions