Karoly Horvath
Karoly Horvath

Reputation: 96346

utilize/saturate the CPUs with multiple processes so that processes still can run properly

I have around 10k video streams that I want to monitor. There's going to be a small cluster (eg: 5-10) of heterogenous machines that monitor these streams. Because there isn't enough CPU to do all this, I will have to shuffle the streams, monitor a couple of them at a time then switch to the next set.

Now, my problem is.. I would like to utilize the cores as much as possible, so that I can use fever sets and this way be able to monitor each stream more often.

Streams have different resolution, so consequently different CPU usage.

Do you see any flaws in these designs? Any other ideas how to do this efficiently?

My other concern is the linux scheduler.. will it be able to distribute the processes properly? There is taskset to set CPU affinity a for process, does it make sense to manually control the allocation? (I think it does)

Also, what's the proper way to measure the CPU usage of a process? There is /proc/PID/stat and getrusage, but both of them return used CPU time, but I need a percentage. (Note: this Q has the lowest priority, if there's no response I will just check the source of top). I know I can monitor the cores with mpstat.

Upvotes: 2

Views: 547

Answers (1)

thkala
thkala

Reputation: 86443

Perhaps I am missing something, but why do you need to group the video streams in fixed sets?

From my understanding of the problem you will be essentially sampling each stream and processing the samples. If I were implementing something like this I would place all streams in a work queue, preferably one that supports work stealing to minimize thread starvation.

Each worker thread would get a stream object/descriptor/URI/whatever from the head of the queue, sample and process it, then move it back at the end of the queue.

CPU utilization should not be an issue, unless a single stream cannot always saturate a single core due to real time constraints. If the latency while processing each sample is not an issue, then you have a few of alternatives:

  • Use a larger number of processing threads, until all cores are fully utilized in all cases.

  • Use separate input threads to receive stream chunks and pass those for processing. This should decouple the network latencies from the actual stream processing.

I am not aware of any worker queue implementation for distributed systems (as opposed to mere SMP systems), but it should be relatively easy to build one of your own if you don't find something that fits your needs...

Upvotes: 1

Related Questions