Nitzan Shaked
Nitzan Shaked

Reputation: 13598

Heroku load average alarmingly high

I am currently trying to understand why some of my requests in my Python Heroku app take >30 seconds. Even simple requests which do absolutely nothing.

One of the things I've done is look into the load average on my dynos. I did three things:

1) Look at the Heroku logs. Once in a while, it will print the load. Here are examples:

Mar 16 11:44:50 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 11.900

Mar 16 11:45:11 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 8.386

Mar 16 11:45:32 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 6.798

Mar 16 11:45:53 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 8.031

2) Run "heroku run uptime" several times, each time hitting a different machine (verified by running "hostname"). Here is sample output from just now:

13:22:09 up 3 days, 13:57, 0 users, load average: 15.33, 20.55, 22.51

3) Measure the load average on the machines on which my dynos live by using psutil to send metrics to graphite. The graphs confirm numbers of anywhere between 5 and 20.

I am not sure whether this explains simple requests taking very long or not, but can anyone say why the load average numbers on Heroku are so high?

Upvotes: 3

Views: 1942

Answers (2)

donnoman
donnoman

Reputation: 161

Heroku sub-virtualizes hosts to the guest 'Dyno' you are using via LXC. When you run 'uptime' you are seeing the whole hosts uptime NOT your containers, and as pointed out by @jon-mountjoy you are getting a new LXC container not one of your running Dynos when you do this.

Heroku’s dyno load calculation also differs from the traditional UNIX/LINUX load calculation.

The Heroku load average reflects the number of CPU tasks that are in the ready queue (i.e. waiting to be processed). The dyno manager takes the count of runnable tasks for each dyno roughly every 20 seconds. An exponentially damped moving average is computed with the count of runnable tasks from the previous 30 minutes where period is either 1-, 5-, or 15-minutes (in seconds), the count_of_runnable_tasks is an entry of the number of tasks in the queue at a given point in time, and the avg is the previous calculated exponential load average from the last iteration

The difference between Heroku's load average and Linux is that Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.

On CPU bound Dyno's I would presume this wouldn't make much difference. On an IO bound Dyno the load averages reported by Heroku would be much lower than what is reported by what you would get if you could get a TRUE uptime on an LXC container.

You can also enable sending periodic load messages of your running dynos with by enabling log-runtime-metrics

Upvotes: 1

Jon Mountjoy
Jon Mountjoy

Reputation: 4526

Perhaps it's expected dyno idling?

PS. I suspect there's no point running heroku run uptime - that will run it in a new one-off dyno every time.

Upvotes: 0

Related Questions