Reputation: 2684
I am wondering what the best practice is for executing new processes (programs) from a running process. To be more specific, I am implementing a C/C++ job scheduler that has to run multiple binaries while communicating with them. Is exec
or fork
common? Or is there any library taking care of this?
Upvotes: 2
Views: 11122
Reputation: 2615
Okay let's start.. There are few ways to create another parallel task from one task. Although I wouldn't name all of them as processes.
Using fork()
system call
Now as you have already mentioned that fork()
creates a process from your parent process. There are few good things and few bad things about fork()
.
Good things
fork()
is able to create a completely different process & in multi-core CPU systems, it can truly achieve parallelismfork()
also creates a child process with different pid
& hence it is nice if you ever want to kill that process explicitly.wait()
& waitpid()
system calls are nice to make the parent wait for child.fork
generates SIGCHILD
signal and with sigaction
function you can make the parent wait for child without blocking it.Bad things
fork
processes do not share the same address space & hence if one process is having say a variable var
, the other process cannot access directly that same var
. Hence communication is a big issue.pipe
, namedpipe
, messageQueues
or sharedMemory
pipe
, namedpipe
and messageQueues
can use read
& write
system calls and because read
& write
system calls are blocking system calls, you application remains synchronized but these IPCs
are very slow. The only fast IPC
is sharedMemory
but it cannot use read
& write
& hence you need to use your own synchronization mechanisms, like semaphores
. But implementing semaphores
for bigger applications is difficult.Here comes pthread
Now thread removes all the difficulties that are faced by fork
.
IPC
.mutex
which is wonderful for any synchronizations needed even for bigger applications.Note: In C++, thread is a part of C++ library, not a system call.
Note 2: Boost threads in C++ are much more mature & recommended to use.
The main idea although is to know that when to use thread & when to use process.
If you need to create a sub-task which doesn't need to work with some other task but it has to work in isolation, use process; otherwise use thread.
The exec
family syscalls are different. It uses your same pid. Hence if you create an application with 500 lines say, and you get a exec
call at line number 250, then that exec
process will be pasted on your whole process and after exec
call, you program will not resume from 251 line. Also, exec
calls don't flush your stdio buffers.
But yes, if you intend to create a separate process, and then use exec
call to perform that task and then come out, then you are welcome to do it, but remember the IPC to store the output otherwise it is of no use
For more info on fork click here
For more info on thread click here
For boost therad click here
@John Zwinck answer is also good but I know little about select()
system call but yes it is possible that way too
Edited: As @ Jonathan Leffler pointed
Editing after a long: After some years I now never think of using all these SPOOKY libraries or senseless gruesome ways of parallel or should I say SEEMINGLY parallel processing. Enter coroutines, the future of CONCURRENT processing. Look at the following Go code. Sure this is possible in C/C++ too. This code would hardly be few milliseconds slower for 7.7 mil rows in database than its C/C++ thread based implementation but sever times more manageable and scalable.
package main
import (
"fmt"
"reflect"
"github.com/jinzhu/gorm"
_ "github.com/jinzhu/gorm/dialects/sqlite"
)
type AirQuality struct {
// gorm.Model
// ID uint `gorm:"column:id"`
Index string `gorm:"column:index"`
BEN string `gorm:"column:BEN"`
CH4 string `gorm:"column:CH4"`
CO string `gorm:"column:CO"`
EBE string `gorm:"column:EBE"`
MXY string `gorm:"column:MXY"`
NMHC string `gorm:"column:NMHC"`
NO string `gorm:"column:NO"`
NO2 string `gorm:"column:NO_2"`
NOX string `gorm:"column:NOx"`
OXY string `gorm:"column:OXY"`
O3 string `gorm:"column:O_3"`
PM10 string `gorm:"column:PM10"`
PM25 string `gorm:"column:PM25"`
PXY string `gorm:"column:PXY"`
SO2 string `gorm:"column:SO_2"`
TCH string `gorm:"column:TCH"`
TOL string `gorm:"column:TOL"`
Time string `gorm:"column:date; type:timestamp"`
Station string `gorm:"column:station"`
}
func (AirQuality) TableName() string {
return "AQ"
}
func main() {
c := generateRowsConcurrent("boring!!")
for row := range c {
fmt.Println(row)
}
}
func generateRowsConcurrent(msg string) <-chan []string {
c := make(chan []string)
go func() {
db, err := gorm.Open("sqlite3", "./load_testing_7.6m.db")
if err != nil {
panic("failed to connect database")
}
defer db.Close()
rows, err := db.Model(&AirQuality{}).Limit(20).Rows()
defer rows.Close()
if err != nil {
panic(err)
}
for rows.Next() {
var aq AirQuality
db.ScanRows(rows, &aq)
v := reflect.Indirect(reflect.ValueOf(aq))
var buf []string
for i := 0; i < v.NumField(); i++ {
buf = append(buf, v.Field(i).String())
}
c <- buf
}
defer close(c)
}()
return c
}
Upvotes: 5
Reputation: 249133
You can use popen()
to spawn the processes and communicate with them. In order to handle communication with many processes from a single parent process, use select()
or poll()
to multiplex the reading/writing of the file descriptors given to you by popen()
(you can use fileno()
to turn a FILE*
into an integer file descriptor).
If you want a library to abstract much of this for you, I suggest libuv. Here's a complete example program I whipped up, largely following the docs at https://nikhilm.github.io/uvbook/processes.html#spawning-child-processes:
#include <cstdio>
#include <cstdlib>
#include <inttypes.h>
#include <uv.h>
static void alloc_buffer(uv_handle_t *handle, size_t suggested_size, uv_buf_t *buf)
{
*buf = uv_buf_init((char*)malloc(suggested_size), suggested_size);
}
void echo_read(uv_stream_t *server, ssize_t nread, const uv_buf_t* buf)
{
if (nread == -1) {
fprintf(stderr, "error echo_read");
return;
}
puts(buf->base);
}
static void on_exit(uv_process_t *req, int64_t exit_status, int term_signal)
{
fprintf(stderr, "Process %d exited with status %" PRId64 ", signal %d\n",
req->pid, exit_status, term_signal);
uv_close((uv_handle_t*)req, NULL);
}
int main()
{
uv_loop_t* loop = uv_default_loop();
const int N = 3;
uv_pipe_t channel[N];
uv_process_t child_req[N];
for (int ii = 0; ii < N; ++ii) {
char* args[3];
args[0] = const_cast<char*>("ls");
args[1] = const_cast<char*>(".");
args[2] = NULL;
uv_pipe_init(loop, &channel[ii], 1);
uv_stdio_container_t child_stdio[3]; // {stdin, stdout, stderr}
child_stdio[STDIN_FILENO].flags = UV_IGNORE;
child_stdio[STDOUT_FILENO].flags = uv_stdio_flags(UV_CREATE_PIPE | UV_WRITABLE_PIPE);
child_stdio[STDOUT_FILENO].data.stream = (uv_stream_t*)&channel[ii];
child_stdio[STDERR_FILENO].flags = UV_IGNORE;
uv_process_options_t options = {};
options.exit_cb = on_exit;
options.file = "ls";
options.args = args;
options.stdio = child_stdio;
options.stdio_count = sizeof(child_stdio) / sizeof(child_stdio[0]);
int r;
if ((r = uv_spawn(loop, &child_req[ii], &options))) {
fprintf(stderr, "%s\n", uv_strerror(r));
return EXIT_FAILURE;
} else {
fprintf(stderr, "Launched process with ID %d\n", child_req[ii].pid);
uv_read_start((uv_stream_t*)&channel[ii], alloc_buffer, echo_read);
}
}
return uv_run(loop, UV_RUN_DEFAULT);
}
The above will spawn three copies of ls
to print the contents of the current directory. They all run asynchronously.
Upvotes: 5