Reputation: 21
I actually work for a big company in France et we aim to ingest the syslog logs (format rfc5424) of all our servers (nearly 1400 servers) in kafka through NIFI. We choose NIFI because we want to route logs to their associated topics depending of the appname found.
So we will have a lot of small flowfiles.
Actually, we encounter performance limitations : we can't ingest more than 5k msg/s and we want to ingest more than 50k msg/s. Of course, if possible, we want to process as most as possible.
We have : listenSyslog (batch size 1 + parsing enabled) => RouteOnAttribute (make a lookup for getting the target topic from appname) => PublishKafka.
Can you give me some advices please?
I'm thinking about this workflow : ListenSyslog (batchsize 1000 + parsing disabled) => PartitionRecord (grokreader to get appname and convert to avro, grouping on appname) => RouteRecord (with lookup embedded, for routing empty appname or topic not found) => PublishKafkaRecord (i understood that it splits a flowfile with multiples records to 1 message per record).
Thank you for your help.
Happy new year to all!
Upvotes: 2
Views: 923
Reputation: 18660
The flow you suggested at the end of your questions is on the right track, basically you want to batch together many messages into a single flow file.
Depending what version of NiFi you are using, newer versions have a Syslog5424Reader:
This would probably be easier to use than the GrokReader, click the additional details link to see the schema it produces.
Also, there is ListenTCPRecord and ListenUDPRecord which you could experiment with in place of ListenSyslog. So you could have ListenTCPRecord/ListenUDPRecord with a Syslog5424Reader and an AvroWriter, then proceed with your suggested flow. You will have to do some testing to see if it is better to just use ListenSyslog, or use the record variants.
Other things to consider when tuning ListenSyslog/ListenTCP/ListenUDP:
Upvotes: 2
Reputation: 21563
For a quick reference, compare your specifications against this througput reference:
If your capacity seems to be sufficient based on this, I would recommend analyzing the problem as follows and eliminating likely suspects one by one:
Upvotes: 1