Reputation: 171
I am trying to Join multiple tables using NiFi. The datasource may be MySQL or RedShift maybe something else in future. Currently, I am using ExecuteSQL processor for this but the output is in a Single flowfile. Hence, for terabyte of data, this may not be suitable. I have also tried using generateTableFetch but this doesn't have join option.
Here are my Questions:
GenerateTableFetch generates SQL queries based on offset. Will this slows down the process when the dataset becomes larger?
Please share your thoughts. Thanks in advance
Upvotes: 1
Views: 975
Reputation: 31490
1.Is there any alternative for ExecuteSQL processor?
joining multiple tables
then we need to use ExecuteSQL
processor.2.Is there a way to make ExecuteSQL processor output in multiple flowfiles? Currently I can split the output of ExecuteSQL using SplitAvro processor. But I want ExecuteSQL itself splitting the output ?
Max Rows for flowfile
, so that ExecuteSQL processor splits the flowfiles. 3.GenerateTableFetch generates SQL queries based on offset. Will this slows down the process when the dataset becomes larger?
if your source table is having indexes
on the Maximum-value Columns
then it won't
slow down the process even if your dataset is becoming larger.
if there is no indexes
created on the source table then there will be full table scan
will be done always, which results slow down the process.
Upvotes: 2