jkool702
jkool702

Reputation: 29

How can I efficiently measure average CPU usage for a group of processes (bash coprocs + their children) on linux?

BACKGROUND: I wrote a bash function called forkrun that parallelizes code for you in the same way that parallel or xargs -P does. It is faster than parallel, and similar in speed to but has more options than xargs -P. forkrun works by spawning a number of persistent bash coprocs, each of which run an infinite loop (until some end condition is met) that will read N lines worth of data (passed on stdin) and run those lines through whatever you are parallelizing.

GOAL: I am trying to determine the total cpu usage of all these coprocs combined. This needs to include the "overhead" cpu usage of the coproc running its loop and the total cumulative cpu usage of whatever it is running for you (which may or may not be a different PID, and could change on every loop iteration). So, I need to total cpu usage from all the coproc PID's + their children (and grandchildren, and great-grandchildren, and...).

END GOAL: I want to have forkrun dynamically determine how many coprocs to spawn based on runtime conditions. Part of my strategy for this involves figuring out + tracking how much cpu time (on average) each of these coprocs is taking up. The current implementation for "dynamic coproc spawning" does this by looking at total system load (by polling /proc/stat) before and after some coprocs are spawned, but this is very noisy since it is influenced by everything else that is happening in the system.''

IDEA: My initial idea was to use /proc/<PID>/stat for each coproc PID and pull out and sum the utime, stime, cutime and cstime fields. Unfortunately, this only takes into account CPU time for waited on children. i.e., you fork it and then call wait. It doesnt include the cpu time for things like "stuff run in subshells".

NOTE: I'd rather avoid using external tools for this. I spent a lot of effort making sure forkrun has virtually no dependencies - currently its only hard dependencies are a recent bash version, a mounted procfs, and some binaries for basic filesystem operations (rm, mkdir). If an external tool is absolutely required then fine, but im 99% sure I can pull this info out of procfs somehow.

Thanks in advance!

EDIT: here is an example of the code run by coprocs that I am trying to track cpu time for. I want to (from the process that forks these coprocs) figure out how much cpu time each coproc is using as they are running in order to dynamically determine whether or not to spawn more of these coprocs.

Upvotes: 1

Views: 82

Answers (2)

dash-o
dash-o

Reputation: 14493

Possible another solution - but will require access to the prctl - using "C" program, or use Python (prctl or ctypes), or Perl (Linux::prctl). Basic idea will be to run a small wrapper to will collect CPU from all childrens, including forked children, background children, etc.

Something like:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <sys/types.h>
#include <sys/wait.h>

int main() {
    printf("Setting this process as a child subreaper...\n");

    if (prctl(PR_SET_CHILD_SUBREAPER, 1) != 0) {
        perror("prctl");
        return 1;
    } ;
    if ( fork() == 0 ) {
    // Child
        execvp(argv[0], argv[]) ;
    } ;
    int status ;
    while ( wait(&status) > 0 ) {
    } ;
    exit(0) ;
}

compile (cc -c child_wait.c -o child_wait

And execute time child_wait fork_run ...

Upvotes: 1

dash-o
dash-o

Reputation: 14493

The script used by OP is large (>2000 lines) and complex. Not practical to analyze - but from OP comments - it uses coproc, background processes, etc.

Possible path:

  • Modify the 'EXIT' trap to record CPU time from each terminate process (conditional on environment variable that will specify when to accumulate the log). At the end of the job, sum up the results

Running the script without the env var will not have any impact. When the envvar is set - the total stat will be recorded and show python/perl/awk script can aggregate the required measure - CPU, ...

trap '[ "$CPUSUM" ] && cat /proc/$$/stat >> $CPUSUM' exit

Or using helper function

func record_cpu {
    [ "$CPUSUM" ] && cat /proc/$$/stat >> $CPUSUM
}

trap record_cpu EXIT

Might have to apply the same to other signals, if code is using TERM and similar to coordinate work between coproc

Upvotes: 1

Related Questions