Salem Moheissen
Salem Moheissen

Reputation: 21

Syslog to Kafka : Most performant workflow in NIFI?

I actually work for a big company in France et we aim to ingest the syslog logs (format rfc5424) of all our servers (nearly 1400 servers) in kafka through NIFI. We choose NIFI because we want to route logs to their associated topics depending of the appname found.

So we will have a lot of small flowfiles.

Actually, we encounter performance limitations : we can't ingest more than 5k msg/s and we want to ingest more than 50k msg/s. Of course, if possible, we want to process as most as possible.

We have : listenSyslog (batch size 1 + parsing enabled) => RouteOnAttribute (make a lookup for getting the target topic from appname) => PublishKafka.

Can you give me some advices please?

I'm thinking about this workflow : ListenSyslog (batchsize 1000 + parsing disabled) => PartitionRecord (grokreader to get appname and convert to avro, grouping on appname) => RouteRecord (with lookup embedded, for routing empty appname or topic not found) => PublishKafkaRecord (i understood that it splits a flowfile with multiples records to 1 message per record).

Thank you for your help.

Happy new year to all!

Upvotes: 2

Views: 923

Answers (2)

Bryan Bende
Bryan Bende

Reputation: 18660

The flow you suggested at the end of your questions is on the right track, basically you want to batch together many messages into a single flow file.

Depending what version of NiFi you are using, newer versions have a Syslog5424Reader:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.8.0/org.apache.nifi.syslog.Syslog5424Reader/index.html

This would probably be easier to use than the GrokReader, click the additional details link to see the schema it produces.

Also, there is ListenTCPRecord and ListenUDPRecord which you could experiment with in place of ListenSyslog. So you could have ListenTCPRecord/ListenUDPRecord with a Syslog5424Reader and an AvroWriter, then proceed with your suggested flow. You will have to do some testing to see if it is better to just use ListenSyslog, or use the record variants.

Other things to consider when tuning ListenSyslog/ListenTCP/ListenUDP:

https://bryanbende.com/development/2016/05/09/optimizing-performance-of-apache-nifis-network-listening-processors

Upvotes: 2

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21563

For a quick reference, compare your specifications against this througput reference:

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_planning-your-deployment/content/ch_hardware-sizing.html

If your capacity seems to be sufficient based on this, I would recommend analyzing the problem as follows and eliminating likely suspects one by one:

  1. How many NIFI nodes are ingesting?
  2. Does adding/reducing one make a difference?
  3. What processor is the bottleneck (the ingest or one of the followup steps)?
  4. Could your source/network be the bottleneck?

Upvotes: 1

Related Questions