Reputation: 147
Blow is an implementation of Sieve of Eratosthenes in C
#include "kernel/stat.h"
#include "kernel/types.h"
#include "user/user.h"
void cull(int p) {
int n;
while (read(0, &n, sizeof(n))) {
// n is not prime
if (n % p != 0) {
write(1, &n, sizeof(n));
}
}
}
void redirect(int k, int pd[]) {
close(k);
dup(pd[k]);
close(pd[0]);
close(pd[1]);
}
void right() {
int pd[2], p;
// read p
if (read(0, &p, sizeof(p))) {
printf("prime %d\n", p);
pipe(pd);
if (fork()) {
// parent
redirect(0, pd);
right();
} else {
redirect(1, pd);
cull(p);
}
}
}
int main(int argc, char *argv[]) {
int pd[2];
pipe(pd);
if (fork()) {
redirect(0, pd);
right();
} else {
redirect(1, pd);
for (int i = 2; i < 36; i++) {
write(1, &i, sizeof(i));
}
}
exit(0);
I am not quite following the logic here:
1). Why does redirect need to close pd1?
2). cull() is reading from the file descriptor 0, but in right() the child process will close 0. Why doesn't this cause any problem?
3). why it is necessary to use redirect here when we only want read from 0 and write to 1?
Update:
1). I found this implementation in the following blog post:
https://abcdlsj.github.io/post/mit-6.828-lab1xv6-and-unix-utilities/
2). I think the idea is borrowed from the inventor of this algorithm
3). I think the reason that the header is so is because it is implemented in a toy operating system for educational purpose.
Upvotes: 2
Views: 1160
Reputation: 165165
The code is an adaptation of the paper Coroutine prime number sieve by M. Douglas McIlroy to the Xv6 operating system, a re-implementation of Version 6 Unix used for teaching. The technique is from 1968 as an experiment in co-routines via pipes. The paper explains the algorithm and rationale.
The program implements the sieve of Eratosthenes as a kind of production line, in which each worker culls multiples of one prime from a passing stream of integers, and new workers are recruited as primes are discovered.
When Unix came to be, my fascination with coroutines led me to badger its author, KenThompson, to allow writes in one process to go not only to devices but also to matching reads in another process.
...the coroutine sieve has been a standard demo for languages or systems that support interprocess communication. Implementations using Unix processes typically place the three coroutines—source, cull and sink—in distinct executable files.The fact that the whole program can be written as a single source file, in a language that supports neither concurrency nor IPC, is a tribute not only to Unix’s pipes, but also to its clean separation of program initiation into fork for duplicating address spaces and exec for initializing them.
I believe using stdin and stdout is an artifact of its origins in the early days of Unix when piping stdin and stdout between processes was first introduced. It makes a lot more sense in shell.
#!/bin/bash
source() {
seq 2 1000000
}
cull() {
while true
do
read n
(($n % $1 != 0)) && echo $n
done
}
sink() {
read p
echo $p
cull $p | sink &
}
source | sink
In C, as we'll see, it's simpler to skip the redirection and pass around pipes.
First, what's going on?
redirect
is redirecting stdin and stdout to a pipe. 0 is stdin and 1 is stdout. This can be made more clear by using STDIN_FILENO and STDOUT_FILENO.
main
makes a pipe.main
forks.
main
redirects stdin to the pipe.main
calls right
.right
reads the first prime, 2, from stdin which is a pipe to the number stream.[number stream] ->(2) [right]
After the initial condition, a switcheroo happens inside right
.
right
makes a pipe.right
forks.
cull
.cull
reads from stdin (the number stream) and writes to stdout (right
).right
redirects its stdin to the pipe, reading from cull
.right
recurses.[number stream] ->(n) [cull] ->(p) [right]
After the first call right
is reading primes from cull
and writing them to the real stdout
. cull
reads candidates from the number stream
and writes primes to right
.
When the number stream
loop ends the process ends and closes its pipe to cull
. Once all the numbers have been read from the pipe, cull
to reads EOF ending its loop and its process, closing its pipe to right
. right
reads EOF and returns back to main which exits.
To explain redirect
we need to understand redirection in C.
First, a simple one-way pipe.
int pd[2];
pipe(pd);
//parent
if (fork()) {
// Parent must close the input side else reading from pd[0] will
// continue to try to read from pd[1] even after the child closes
// their pipe.
close(pd[1]);
int p;
while( read(pd[0], &p, sizeof(p)) ) {
printf("p = %d\n", p);
}
fprintf(stderr, "parent done reading\n");
}
// child
else {
// Not strictly necessary, but the child will not be reading.
close(pd[0]);
for (int i = 2; i < 10; i++) {
write(pd[1], &i, sizeof(i));
}
// Tell the parent we're done writing to the pipe.
// The parent will get EOF on its next read. If the child
// does not close the pipe, the parent will hang waiting to read.
close(pd[1]);
fprintf(stderr, "child done writing\n");
// Pipes are closed automatically when a process exits, but
// let's simulate the child not immediately exiting to
// illustrate why it's important to explicitly close pipes.
sleep(1);
}
The parent and child share a pipe. The parent reads from one end, the child writes to the other. The child closes their write end when they're done so the parent doesn't hang trying to read. The parent closes their write end immediately so their pipe doesn't try to read from it.
Instead of passing the pipe around, redirect
is redirecting the parent's half to stdin and the child's half to stdout. Let's do that in our simple example using dup2
. dup2
duplicates a descriptor to another, first closing the target.
int pd[2];
pipe(pd);
if (fork()) {
// Redirect pd[0] to stdin.
dup2(pd[0], STDIN_FILENO);
// Parent still has to close its input side.
close(pd[1]);
int p;
while( read(STDIN_FILENO, &p, sizeof(p)) ) {
printf("p = %d\n", p);
}
fprintf(stderr, "parent done reading\n");
} else {
// Redirect pd[1] to stdout.
dup2(pd[1], STDOUT_FILENO);
// Close the original pd[1] so the parent doesn't try to read from it.
close(pd[1]);
for (int i = 2; i < 10; i++) {
write(STDOUT_FILENO, &i, sizeof(i));
}
// Tell the parent we're done writing.
close(STDOUT_FILENO);
fprintf(stderr, "child done writing\n");
sleep(1);
}
The final twist is dup
. dup
duplicates pd[k]
to the lowest numbered descriptor currently not in use by the process. redirect(0, pd)
closes descriptor 0 and then copies pd[0]
to the lowest numbered descriptor: 0.
redirect(1, pd)
closes descriptor 1 and then copies pd[1]
to what it hopes is the lowest numbered descriptor: 1. If something else closed 0, redirect(1, pd)
will copy pd[1]
to descriptor 0 and the code will not work. This can be avoided by using dup2
which makes it explicit which file descriptor you're copying to.
// close(k) and dup(pd[k]) to k safely and explicitly.
dup2(pd[k], k);
redirect
can be rewritten as:
void redirect(int k, int pd[]) {
dup2(pd[k], k);
close(pd[0]);
close(pd[1]);
}
Note that is all for a one-way pipe. cull
uses bi-directional pipes, but the idea is the same.
By redirecting its pipe to stdin and stdout the program can use the pipe without having to pass the pipe around. This lets right
read the first prime from the number generator and then let cull
read the rest. It could also be done explicitly.
With some simpler examples in place, now we can answer the questions.
1). Why does redirect need to close pd[1]?
The parent must close the input side of its pipe, even after it's been duplicated or closed by the child, else the pipe will remain open and the parent will hang trying to read from it.
- cull() is reading from the file descriptor 0, but in right() the child process will close 0. Why doesn't this cause any problem?
right
closes its 0 and then copies pd[0] to 0. cull
does not close 0, it closes pd[0]
. cull
reads from the original 0 which is the number generator.
- Why it is necessary to use redirect here when we only want read from 0 and write to 1?
Because we need 0 and 1 to be different things at different times. We don't really want to read from 0 and write to 1. We want to read and write from pipes which happen to be attached to 0 and 1. The program is redirecting its pipes to 0 and 1 to demonstrate how Unix pipes and redirection works internally.
It took a dabbler like me some time to figure out how this program worked, it would have been a lot easier if I'd read the original paper and seen the shell script version first.
It can be rewritten to use explicit pipes. This avoids action at a distance, is easier to understand, and still demonstrates pipes and co-routines, but it no longer illustrates redirection.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
void cull(int p, int read_d, int write_d) {
int n;
while (read(read_d, &n, sizeof(n))) {
if (n % p != 0) {
write(write_d, &n, sizeof(n));
}
}
}
void right(int read_d) {
int p;
if (read(read_d, &p, sizeof(p))) {
printf("prime %d\n", p);
int cull_pipe[2];
pipe(cull_pipe);
if (fork()) {
// right() reads from cull().
close(cull_pipe[1]);
right(cull_pipe[0]);
} else {
// cull() reads from the number generator and writes to right().
close(cull_pipe[0]);
cull(p, read_d, cull_pipe[1]);
close(cull_pipe[1]);
}
}
}
int main(int argc, char *argv[]) {
int pd[2];
pipe(pd);
if (fork()) {
// The first call to right() reads from the number generator.
close(pd[1]);
right(pd[0]);
} else {
close(pd[0]);
for (int i = 2; i < 6; i++) {
write(pd[1], &i, sizeof(i));
}
close(pd[1]);
}
exit(0);
}
Other notes:
The headers can be replaced with standard headers.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
Upvotes: 6