Reputation: 9028
I use Kafka 0.10.1.1 and using the custom authorizer.
From the custom authorizer, I call a microservice for authorizaton. It works fine for a while and stars throwing the following exception in the logs and the whole cluster becomes unresponsive. The exception keeps coming until i restart the cluster. But, the whole cluster works fine without any issues even for months without the custom authorizer. Is there any bug in the Kafka version 0.10.1.1 or anything wrong with the custom authorizer.
TRACE [ReplicaFetcherThread-0-39], Issuing to broker 1 of fetch request kafka.server.ReplicaFetcherThread$FetchRequest@8c63320 (kafka.server.ReplicaFetcherThread)
[2017-06-30 08:29:17,473] TRACE [ReplicaFetcherThread-2-1], Issuing to broker 1 of fetch request kafka.server.ReplicaFetcherThread$FetchRequest@67a143a (kafka.server.ReplicaFetcherThread)
[2017-06-30 08:29:17,473] WARN [ReplicaFetcherThread-3-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@12d29e06 (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to <HOST:PORT> (id: 1 rack: null) failed
at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:83)
at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:93)
at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:248)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
My Custom authorizer uses microservice for checking authorization and caches data in a guava caches with the expiry time of 10 mins.
Thanks
Upvotes: 0
Views: 688
Reputation: 4314
I suggest taking a thread dump to see what all the threads are doing.
Just a guess here, given there isn't much info to go on. If you have as single cache instance what could be happening is that once the cache expires, all requests start hitting the microservice for authorization info and, since this adds latency, the thread pool gets exhausted. A thread dump can tell you how many threads are calling the microservice simultaneously.
If this is indeed the problem, one of the options you could consider, is to use a separate cache per thread (using a Thread-local variable). That way each thread's cache will expire at its own time and won't cause other threads hitting the microservice at exactly the same time.
Another, and a better way IMO is to remove the blocking calls to the microservice from the authorize code-path completely. Instead of a fall-through cache, have the cache always up to date by updating it from a separate background thread. This way no latency is ever added to the authorize calls.
Upvotes: 1