Druid Cluster going into Restricted Mode

Question

We have a Druid Cluster with the following specs

3X Coordinators & Overlords - m5.2xlarge
6X Middle Managers(Ingest nodes with 5 slots) - m5d.4xlarge
8X Historical - i3.4xlarge
2X Router & Broker - m5.2xlarge

Cluster often goes into Restricted mode

All the calls to the Cluster gets rejected with a 502 error.

Even with 30 available slots for the index-parallel tasks, cluster only runs 10 at time and the other tasks are going into waiting state.

Loader Task submission time has been increasing monotonically from 1s,2s,..,6s,..10s(We submit a job to load the data in S3), after recycling the cluster submission time decreases and increases again over a period of time

We submit around 100 jobs per minute but we need to scale it to 300 to catchup with our current incoming load

Cloud someone help us with our queries

Tune the specs of the cluster

What parameters to be optimized to run maximum number of tasks in parallel without increasing the load on the master nodes

Why is the loader task submission time increasing, what are the parameters to be monitored here

Druid Cluster going into Restricted Mode

Answers (1)

Related Questions