Gowthaman V
Gowthaman V

Reputation: 171

Nifi joins using ExecuteSQL for larger tables

I am trying to Join multiple tables using NiFi. The datasource may be MySQL or RedShift maybe something else in future. Currently, I am using ExecuteSQL processor for this but the output is in a Single flowfile. Hence, for terabyte of data, this may not be suitable. I have also tried using generateTableFetch but this doesn't have join option.

Here are my Questions:

  1. Is there any alternative for ExecuteSQL processor?
  2. Is there a way to make ExecuteSQL processor output in multiple flowfiles? Currently I can split the output of ExecuteSQL using SplitAvro processor. But I want ExecuteSQL itself splitting the output
  3. GenerateTableFetch generates SQL queries based on offset. Will this slows down the process when the dataset becomes larger?

    Please share your thoughts. Thanks in advance

Upvotes: 1

Views: 975

Answers (1)

notNull
notNull

Reputation: 31490

1.Is there any alternative for ExecuteSQL processor?

  • if you are joining multiple tables then we need to use ExecuteSQL processor.

2.Is there a way to make ExecuteSQL processor output in multiple flowfiles? Currently I can split the output of ExecuteSQL using SplitAvro processor. But I want ExecuteSQL itself splitting the output ?

  • Starting from NiFi-1.8 version we can configure Max Rows for flowfile, so that ExecuteSQL processor splits the flowfiles.
  • NiFi-1251 addressing this issue.

3.GenerateTableFetch generates SQL queries based on offset. Will this slows down the process when the dataset becomes larger?

  • if your source table is having indexes on the Maximum-value Columns then it won't slow down the process even if your dataset is becoming larger.

  • if there is no indexes created on the source table then there will be full table scan will be done always, which results slow down the process.

Upvotes: 2

Related Questions