Jin Ma
Jin Ma

Reputation: 243

hold a large number of flowfiles in apache nifi

I have a nifi flow that will consume from Kafka and populate the consumed data to ElasticSearch. Occasionally, ElasticSearch will be down for hours due to maintenances. Since KAFKA is not affected, the flow will continue to consume and flow files will clog all queues. When all incoming and outgoing queues of putElasticSearchHttp processors are full, even after ElasticSearch is restarted, putElasticSearchHttp will hang and do not process any new incoming flow files.

I understand I can increase the back pressure, object threshold and memory threshold but it cannot guarantee that the threshold can deal with all situations.

I'm wondering if there's any way to address this issue

  1. either automatically stops a processor and resume it later based on high water mark threshold setting
  2. other than flowfile queue, any other way to store these flowfiles and handle them later
  3. open to other suggestions to deal with this use case.

Thanks.

Upvotes: 0

Views: 31

Answers (2)

AndreaL77
AndreaL77

Reputation: 1

You should keep the events on Kafka and make Nifi stop ingesting data.
This can be done reducing the queues size and makeing backpressure work, stopping the NiFi Consumer before the NiFi system is overloaded and therefore blocked.

Upvotes: 0

Jin Ma
Jin Ma

Reputation: 243

I found a solution, there two back pressure threshold you can set for a flowfile queue. The object threshold: # of flowfiles size threshold.

In my use case, it is the object threshold got hit almost all the time. I set this threshold to 0 which will be unlimited. Now the back pressure will only be triggered if size limit is reached. Size limit is hard to reach (there's still a risk though) in my flows.

Upvotes: 0

Related Questions