bp2010
bp2010

Reputation: 2472

PutHiveQL NiFi Processor extremely slow - misconfiguration?

I am currently setting up a simple NiFi flow that reads from a RDBMS source and writes to a Hive sink. The flow works as expected until the PuHiveSql processor, which is running extremely slow. It inserts one record every minute approximately.
Currently is setup as a standalone instance running on one node.

enter image description here

The logs showing the insert every 1 minute approx:

(INSERT INTO customer (id, name, address) VALUES (x, x, x)) enter image description here

Any ideas about why this may be? Improvements to try?

Thanks in advance

Upvotes: 1

Views: 1372

Answers (2)

notNull
notNull

Reputation: 31490

Inserting one record at a time into Hive will result extreme slowness.

As your doing regular insert into hive table:

Change your flow:

QueryDatabaseTable
PutHDFS

Then create Hive avro table on top of HDFS directory where you have stored the data.

(or)

QueryDatabaseTable
ConvertAvroToORC //incase if you need to store data in orc format
PutHDFS

Then create Hive orc table on top of HDFS directory where you have stored the data.

Upvotes: 2

user3237183
user3237183

Reputation:

Are you poshing one record at time? if so may use the merge record process to create batches before pushing into HiveQL,

It is recommended to batch into 100 records : See here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hive-nar/1.5.0/org.apache.nifi.processors.hive.PutHiveQL/

Batch Size | 100 | The preferred number of FlowFiles to put to the database in a single transaction

Use the MergeRecord process and set the number of records or/and timeout, it should speed-up considerably

Upvotes: 0

Related Questions