utilize/saturate the CPUs with multiple processes so that processes still can run properly

Question

I have around 10k video streams that I want to monitor. There's going to be a small cluster (eg: 5-10) of heterogenous machines that monitor these streams. Because there isn't enough CPU to do all this, I will have to shuffle the streams, monitor a couple of them at a time then switch to the next set.

Now, my problem is.. I would like to utilize the cores as much as possible, so that I can use fever sets and this way be able to monitor each stream more often.

Streams have different resolution, so consequently different CPU usage.

I relatively simple solution would be to measure the CPU usage for the highest bitrate stream on each machine (different CPUs, different usage). If it's 10%, and I have 4 cores I can safely run 9*4=36 processes at a time on that machine. But this would clearly waste a lot of CPU power, as other streams have low bitrates.
A better solution would be to constantly monitor the usage of the cores and if the utilization is below a threshold (eg: 95-10=85%) then start a new process.
A complex would be to start a new process with nice -n 20, then somehow check whether it is able to process the data (xx), if so, then renice it to normal priority and try the same thing with the next process... (xx: at the moment I'm not sure whether this is doable..)

Do you see any flaws in these designs? Any other ideas how to do this efficiently?

My other concern is the linux scheduler.. will it be able to distribute the processes properly? There is taskset to set CPU affinity a for process, does it make sense to manually control the allocation? (I think it does)

Also, what's the proper way to measure the CPU usage of a process? There is /proc/PID/stat and getrusage, but both of them return used CPU time, but I need a percentage. (Note: this Q has the lowest priority, if there's no response I will just check the source of top). I know I can monitor the cores with mpstat.

thkala · Accepted Answer

Perhaps I am missing something, but why do you need to group the video streams in fixed sets?

From my understanding of the problem you will be essentially sampling each stream and processing the samples. If I were implementing something like this I would place all streams in a work queue, preferably one that supports work stealing to minimize thread starvation.

Each worker thread would get a stream object/descriptor/URI/whatever from the head of the queue, sample and process it, then move it back at the end of the queue.

CPU utilization should not be an issue, unless a single stream cannot always saturate a single core due to real time constraints. If the latency while processing each sample is not an issue, then you have a few of alternatives:

Use a larger number of processing threads, until all cores are fully utilized in all cases.
Use separate input threads to receive stream chunks and pass those for processing. This should decouple the network latencies from the actual stream processing.

I am not aware of any worker queue implementation for distributed systems (as opposed to mere SMP systems), but it should be relatively easy to build one of your own if you don't find something that fits your needs...

utilize/saturate the CPUs with multiple processes so that processes still can run properly

Answers (1)

Related Questions