Esper Performance issues

Question

We have a prototype of esper running, but the performance is considerably lacking. I guess this is my fault somehow rather than inherently an issue with esper, so was looking for help in locating where my performance issue is.

I am running one instance of the esper service, and I have allocated the memory constraints as follows: -Xmx6G -Xms1G (I have tried various combinations of these values). And it can use 4 cores of the CPU. No other services are running at the time of these tests, only esper, kafka, zookeeper.

I am using Akka Streams to stream events into Esper, the service is very simple, it streams in from kafka, inserts the events into Esper Runtime, Esper has 3 EPStatements tested and working. There is one listener and I add it to all 3 statements, the listener outputs the matched events to kafka.

Some things I've tried to isolate where the performance issue is:

Remove some EPStatements
Remove all EPStatements
Remove Listener
Remove EPStatements and Listener
Remove esper .sendEvent(...) (This improves performance significantly, so it seems an esper issue, rather than an akka issue)

Only number 4 above caused any significant observable performance benefit.

Below is an example query we're running through esper. It's tested and works, I have read the performance tuning section of the documentation and it seems ok to me. All my queries follow a similar format:

select * from EsperEvent#time(5 minutes)
  match_recognize (
    partition by asset_id
    measures A as event1, B as event2, C as event3
    pattern (A Z* B Z* C)
    interval 10 seconds or terminated
    define
      A as A.eventtype = 13 AND A.win_EventID = "4624" AND A.win_LogonType = "3",
      B as B.eventtype = 13 AND B.win_EventID = "4672",
      C as C.eventtype = 13 AND (C.win_EventID = "4697" OR C.win_EventID = "7045")
)

Some Code..

Here is my akka stream:

  kafkaConsumer
    .via(parsing) // Parse the json event to a POJO for esper. Have tried without this step also, no performance impact
    .via(esperFlow) // mapAsync call to sendEvent(...)
    //Here I am using kafka to measure the flow throughput rate. This is where I establish my throughput rate, based on the rate messages are written to "esper_flow_through" topic.
    .map(rec => new ProducerRecord[Array[Byte], String]("esper_flow_through", Serialization.write(rec)))
    .runWith(sink)

esperFlow (Parallelism = 4 by default):

val esperFlow = Flow[EsperEvent]
    .mapAsync(Parallelism)(event => Future {
      engine.getEPRuntime.sendEvent(event)
      event
    })

Listener:

  override def update(newEvents: Array[EventBean], oldEvents: Array[EventBean], statement: EPStatement, epServiceProvider: EPServiceProvider): Unit = Future {
    logger.info(s"Received Listener updates: Query Name: ${statement.getName} ---- ${newEvents.map(_.getUnderlying)}, $oldEvents")
    statement.getName match {
      case "SERVICE_INSTALL" => serviceInstall.increment(newEvents.length)
      case "ADMIN_GROUP" => adminGroup.increment(newEvents.length)
      case "SMB_SHARE" => smbShare.increment(newEvents.length)
    }
    newEvents.map(_.getUnderlying.toString).toList
      .foreach(queryMatch => {
        val record: ProducerRecord[Array[Byte], String] = new ProducerRecord[Array[Byte], String]("esper_output", queryMatch)
        producer.send(record)
      })
  }

Performance observations:

Input stream has a rate of ~2.4k per second.
We see esper is unable to keep up from the beginning. Maxing out at ~600 per second
Esper gradually decreses in throughput
Eventually esper throughput stabalises <100 per second

Profiling, nothing seems out of sorts here:

The rate seems very low, so I am assuming I am missing something here with regards to some esper configuration?

Our target throughput is to have ~10k per second. We are a long way from this, and we have a similar POC in Spark that gets closer to this target.

Update:

Following @user650839 comments, I was able to improve my throughput to a steady 1k per second. Both of these queries produce the same throughput:

select * from EsperEvent(eventtype = 13 and win_EventID in ("4624", "4672", "4697", "7045"))#time(5 minutes)
     match_recognize (
       partition by asset_id
       measures A as event1, B as event2, C as event3
       pattern (A B C)
       interval 10 seconds or terminated
       define
         A as A.eventtype = 13 AND A.win_EventID = "4624" AND A.win_LogonType = "3",
         B as B.eventtype = 13 AND B.win_EventID = "4672",
         C as C.eventtype = 13 AND (C.win_EventID = "4697" OR C.win_EventID = "7045"))

create context NetworkLogonThenInstallationOfANewService
start EsperEvent(eventtype = 13 AND win_EventID = "4624" AND win_LogonType = "3")
end pattern [
 b=EsperEvent(eventtype = 13 AND win_EventID = "4672") ->
 c=EsperEvent(eventtype = 13 AND (win_EventID = "4697" OR win_EventID = "7045"))
 where timer:within(5 minutes)
]

context NetworkLogonThenInstallationOfANewService select * from EsperEvent output when terminated

However 1k per second is still too slow for our needs.

Esper Performance issues

Answers (1)

Related Questions