Jes
Jes

Reputation: 2684

Execute a new process or multiples in a program

I am wondering what the best practice is for executing new processes (programs) from a running process. To be more specific, I am implementing a C/C++ job scheduler that has to run multiple binaries while communicating with them. Is exec or fork common? Or is there any library taking care of this?

Upvotes: 2

Views: 11122

Answers (2)

Mayukh Sarkar
Mayukh Sarkar

Reputation: 2615

Okay let's start.. There are few ways to create another parallel task from one task. Although I wouldn't name all of them as processes.

Using fork() system call

Now as you have already mentioned that fork() creates a process from your parent process. There are few good things and few bad things about fork().

Good things

  1. fork() is able to create a completely different process & in multi-core CPU systems, it can truly achieve parallelism
  2. fork() also creates a child process with different pid & hence it is nice if you ever want to kill that process explicitly.
  3. wait() & waitpid() system calls are nice to make the parent wait for child.
  4. fork generates SIGCHILD signal and with sigaction function you can make the parent wait for child without blocking it.

Bad things

  1. fork processes do not share the same address space & hence if one process is having say a variable var, the other process cannot access directly that same var. Hence communication is a big issue.
  2. To communicate you need to use certain IPC mechanisms like pipe, namedpipe, messageQueues or sharedMemory
  3. Now out of these pipe, namedpipe and messageQueues can use read & write system calls and because read & write system calls are blocking system calls, you application remains synchronized but these IPCs are very slow. The only fast IPC is sharedMemory but it cannot use read & write & hence you need to use your own synchronization mechanisms, like semaphores. But implementing semaphores for bigger applications is difficult.

Here comes pthread

Now thread removes all the difficulties that are faced by fork.

  1. It doesn't create a separate process.
  2. It rather creates few light-weight subtasks which can run almost parallel.
  3. They all share same address space & hence no need for any IPC.
  4. The come with mutex which is wonderful for any synchronizations needed even for bigger applications.
  5. Thread also don't create any process hence all threads is a part of same process and hence will have same pid.

Note: In C++, thread is a part of C++ library, not a system call.

Note 2: Boost threads in C++ are much more mature & recommended to use.

The main idea although is to know that when to use thread & when to use process.

If you need to create a sub-task which doesn't need to work with some other task but it has to work in isolation, use process; otherwise use thread.

The exec family syscalls are different. It uses your same pid. Hence if you create an application with 500 lines say, and you get a exec call at line number 250, then that exec process will be pasted on your whole process and after exec call, you program will not resume from 251 line. Also, exec calls don't flush your stdio buffers.

But yes, if you intend to create a separate process, and then use exec call to perform that task and then come out, then you are welcome to do it, but remember the IPC to store the output otherwise it is of no use

For more info on fork click here

For more info on thread click here

For boost therad click here

@John Zwinck answer is also good but I know little about select() system call but yes it is possible that way too

Edited: As @ Jonathan Leffler pointed

Editing after a long: After some years I now never think of using all these SPOOKY libraries or senseless gruesome ways of parallel or should I say SEEMINGLY parallel processing. Enter coroutines, the future of CONCURRENT processing. Look at the following Go code. Sure this is possible in C/C++ too. This code would hardly be few milliseconds slower for 7.7 mil rows in database than its C/C++ thread based implementation but sever times more manageable and scalable.

package main

import (
    "fmt"
    "reflect"

    "github.com/jinzhu/gorm"
    _ "github.com/jinzhu/gorm/dialects/sqlite"
)

type AirQuality struct {
    // gorm.Model
    // ID      uint   `gorm:"column:id"`
    Index   string `gorm:"column:index"`
    BEN     string `gorm:"column:BEN"`
    CH4     string `gorm:"column:CH4"`
    CO      string `gorm:"column:CO"`
    EBE     string `gorm:"column:EBE"`
    MXY     string `gorm:"column:MXY"`
    NMHC    string `gorm:"column:NMHC"`
    NO      string `gorm:"column:NO"`
    NO2     string `gorm:"column:NO_2"`
    NOX     string `gorm:"column:NOx"`
    OXY     string `gorm:"column:OXY"`
    O3      string `gorm:"column:O_3"`
    PM10    string `gorm:"column:PM10"`
    PM25    string `gorm:"column:PM25"`
    PXY     string `gorm:"column:PXY"`
    SO2     string `gorm:"column:SO_2"`
    TCH     string `gorm:"column:TCH"`
    TOL     string `gorm:"column:TOL"`
    Time    string `gorm:"column:date; type:timestamp"`
    Station string `gorm:"column:station"`
}

func (AirQuality) TableName() string {
    return "AQ"
}

func main() {
    c := generateRowsConcurrent("boring!!")

    for row := range c {
        fmt.Println(row)
    }
}

func generateRowsConcurrent(msg string) <-chan []string {
    c := make(chan []string)
    go func() {
        db, err := gorm.Open("sqlite3", "./load_testing_7.6m.db")
        if err != nil {
            panic("failed to connect database")
        }
        defer db.Close()
        rows, err := db.Model(&AirQuality{}).Limit(20).Rows()
        defer rows.Close()
        if err != nil {
            panic(err)
        }
        for rows.Next() {
            var aq AirQuality
            db.ScanRows(rows, &aq)
            v := reflect.Indirect(reflect.ValueOf(aq))
            var buf []string
            for i := 0; i < v.NumField(); i++ {
                buf = append(buf, v.Field(i).String())
            }
            c <- buf
        }

        defer close(c)
    }()
    return c
}

Upvotes: 5

John Zwinck
John Zwinck

Reputation: 249133

You can use popen() to spawn the processes and communicate with them. In order to handle communication with many processes from a single parent process, use select() or poll() to multiplex the reading/writing of the file descriptors given to you by popen() (you can use fileno() to turn a FILE* into an integer file descriptor).

If you want a library to abstract much of this for you, I suggest libuv. Here's a complete example program I whipped up, largely following the docs at https://nikhilm.github.io/uvbook/processes.html#spawning-child-processes:

#include <cstdio>
#include <cstdlib>
#include <inttypes.h>
#include <uv.h>

static void alloc_buffer(uv_handle_t *handle, size_t suggested_size, uv_buf_t *buf)
{
    *buf = uv_buf_init((char*)malloc(suggested_size), suggested_size);
}

void echo_read(uv_stream_t *server, ssize_t nread, const uv_buf_t* buf)
{
    if (nread == -1) {
        fprintf(stderr, "error echo_read");
        return;
    }

    puts(buf->base);
}

static void on_exit(uv_process_t *req, int64_t exit_status, int term_signal)
{
    fprintf(stderr, "Process %d exited with status %" PRId64 ", signal %d\n",
            req->pid, exit_status, term_signal);
    uv_close((uv_handle_t*)req, NULL);
}

int main()
{
    uv_loop_t* loop = uv_default_loop();
    const int N = 3;
    uv_pipe_t channel[N];
    uv_process_t child_req[N];

    for (int ii = 0; ii < N; ++ii) {
        char* args[3];
        args[0] = const_cast<char*>("ls");
        args[1] = const_cast<char*>(".");
        args[2] = NULL;

        uv_pipe_init(loop, &channel[ii], 1);

        uv_stdio_container_t child_stdio[3]; // {stdin, stdout, stderr}                                                 
        child_stdio[STDIN_FILENO].flags = UV_IGNORE;
        child_stdio[STDOUT_FILENO].flags = uv_stdio_flags(UV_CREATE_PIPE | UV_WRITABLE_PIPE);
        child_stdio[STDOUT_FILENO].data.stream = (uv_stream_t*)&channel[ii];
        child_stdio[STDERR_FILENO].flags = UV_IGNORE;

        uv_process_options_t options = {};
        options.exit_cb = on_exit;
        options.file = "ls";
        options.args = args;
        options.stdio = child_stdio;
        options.stdio_count = sizeof(child_stdio) / sizeof(child_stdio[0]);

        int r;
        if ((r = uv_spawn(loop, &child_req[ii], &options))) {
            fprintf(stderr, "%s\n", uv_strerror(r));
            return EXIT_FAILURE;
        } else {
            fprintf(stderr, "Launched process with ID %d\n", child_req[ii].pid);
            uv_read_start((uv_stream_t*)&channel[ii], alloc_buffer, echo_read);
        }
    }

    return uv_run(loop, UV_RUN_DEFAULT);
}

The above will spawn three copies of ls to print the contents of the current directory. They all run asynchronously.

Upvotes: 5

Related Questions