Reputation: 69
For an exercise, I have to write a program that compares two files in parallel. The degree of parallelism is decided by the user through the -j parameter.
Eg:
./myprog -j 8 file1 file2
means that 8 processes will be created and each one will compare 1/8 of the two files.
Assume we have 3 files of which 2 are equal (file1 = file2).
When the program compares two different files, everything is fine (or at least the error is hid).
When I go to compare two identical files, about once in four, the program says "file1 file2 differ" which is obviously wrong.
Can someone help me to understand why and how to fix it? Thank you!
#include <...>
int cmp(int, int, int, int);
int main(int argc, char *argv[]){
int len, status;
int fd1, fd2;
int term;
int i, j;
int opt, num;
int start, stop;
//[...] Various checks on files length and getopt()
num = atoi(optarg);
int pid[num];
for(i = 0; i < num; i++){
pid[i] = fork();
if(pid[i] == 0){
start = (len/num)*i;
stop = ((len/num)*(i+1))-1;
if(cmp(fd1, fd2, start, stop)) abort();
else exit(EXIT_SUCCESS);
}else if(pid[i] < 0){
perror(NULL);
exit(EXIT_FAILURE);
}
}
for(i = 0; i < num; i++){
term = wait(&status);
if(WIFSIGNALED(status)){
//when the error occours start or stop result gives a rondom high number like 1835627636
printf("PID-> %d START %d STOP %d\n", term, start, stop);
fprintf(stderr, "%s\n", "file1 file2 differ");
for(j = 0; j < num; j++){
if(pid[j] != term){
printf("KILL %d\n", j);
kill(pid[j], 0);
}
}
exit(EXIT_FAILURE);
}
}
exit(EXIT_SUCCESS);
}
int cmp(int fd1, int fd2, int start, int stop){
char buf1, buf2;
if(start > stop) return 0;
else{
fsync(fd1);
fsync(fd2);
lseek(fd1, start, SEEK_SET);
read(fd1, &buf1, sizeof(char));
lseek(fd2, start, SEEK_SET);
read(fd2, &buf2, sizeof(char));
if(buf1 != buf2) return 1;
else cmp(fd1, fd2, start+1, stop);
}
}
Upvotes: 1
Views: 312
Reputation: 33631
cmp
may return a random value (i.e. undefined behavior) because the final else
is:
else cmp(fd1, fd2, start+1, stop);
It should be:
else return cmp(fd1, fd2, start+1, stop);
Also, in main
, start/stop
are only set in the child process, so they're invalid in the parent process. Also, even if they were valid in the parent, they would only be set to the value of the last child forked.
Also, there is no guarantee that wait
will return the pids in order. (e.g. It might return pid[3]
before pid[2]
. Thus, you'd try to kill processes when you shouldn't. You may want to use waitpid
instead. Or just do:
int retval = EXIT_SUCCESS;
while (1) {
term = wait(&status);
if (term < 0)
break;
if (WIFSIGNALED(status)) {
printf(...);
retval = EXIT_FAILURE;
}
}
exit(retval);
And, remove the kill
as it's not really needed and is more likely to cause issues (i.e. it's a source of one of your bugs). Oops! I just noticed that you're sending a kill
with a signal value of zero. This is largely a no-op.
UPDATE:
By launching the program on the same file and printing the characters that are compared, the program prints the same characters except the last two which are reversed in order and therefore the comparison buf1!=buf2 is true and cmp return 1. Could it be a race condition prolem?
Yes, exactly. Because each child uses the same file descriptor, an lseek
in one child will affect the file positions in all the others.
From the fork
manpage:
The child inherits copies of the parent's set of open file descriptors. Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent. This means that the two file descriptors share open file status flags, file offset, and signal-driven I/O attributes
Here is a bit more about the effect of sharing file offsets from the dup
manpage:
After a successful return, the old and new file descriptors may be used interchangeably. They refer to the same open file description (see open(2)) and thus share file offset and file status flags; for example, if the file offset is modified by using lseek(2) on one of the file descriptors, the offset is also changed for the other. (see the description of F_SETOWN and F_SETSIG in fcntl(2)).
To solve this, have each child do it's own/unique open
of the two files. Then, they won't share file offsets and there will be no race condition. The extra overhead of the extra open/close
will be minimal.
A few more points ...
Signals are usually reserved for exceptional/unexpected conditions (e.g. segmentation fault, etc). Thus, using abort
in the child to communicate a mismatch can/should be replaced with a non-zero exit code.
This is much cleaner and affords a greater degree of flexibility. EXIT_SUCCESS/EXIT_FAILURE
are a relatively new set of defines. When doing exit(code)
, the child is allowed to return any 7 bit error code (i.e. 0-127 inclusive). See the man pages for cmp
and rsync
as an example.
Here, you could use any convention you wish for child error codes like: 0=match, 1=mismatch, 2=short read on one file, 3=I/O error, etc.
As Aganju pointed out, making the cmp
function recursive will be very inefficient and crash the stack. Most stacks are defaulted to ~8MB, so this limits how large a file you can process. Consider your example of 8 processes. If the file size is greater than 64MB, you'll overflow the stack and get a segfault, even if the files are equal.
Plus, doing a read
for a single byte is very slow and completely negates any speed gains you get for parallel processes.
So, even though you split up the work between several processes, each child process will still have to loop through smaller chunks of its own segment of responsibility
When reading the optimal buffer size isn't always the largest. For some file systems (e.g. ext4
) the recommended size, IIRC, is 64KB.
I also recommend a custom struct that keeps track of the start/stop
positions and various other per-child data.
Here is a refactored version of your code that illustrates much of this. It isn't tested, and still a bit rough, but it should help you get further [please pardon the gratuitous style cleanup].
Things to note/check:
Files of differing length don't need to be compared. They can never match. Herein cmp
assumes the files are of equal length.
The number of processes should be clipped to the file size (e.g. if you have 8 processes and the file length is 4, the number of processes should be clipped to 4).
The range/limit/size calculations should be doubled checked as there may be "off-by-one" errors.
Also, be sure to use off_t
(which guarantees 64 bit values with the correct #define
set before any #include
directives) to allow the program to work correctly with files >2GB
#include <...>
char *file1
char *file2;
typedef struct pidctl {
int chunknum;
off_t start;
off_t stop;
} pidctl_t;
#define CHUNKSIZE (64 * 1024)
int retcode = EXIT_SUCCESS;
int cmp(pidctl_t *ctl);
int
main(int argc, char *argv[])
{
off_t len;
int status;
int fd1;
int fd2;
int term;
int i,
j;
int opt,
num;
int start,
stop;
//[...] Various checks on files length and getopt()
num = atoi(optarg);
// process count should be clipped to number of bytes in file
if (num > filesize)
num = filesize;
pidctl_t pidlist[num];
for (i = 0; i < num; i++) {
ctl = &pidlist[i];
ctl->chunknum = i;
ctl->start = (len / num) * i;
ctl->stop = ((len / num) * (i + 1)) - 1;
// the last segment must be clipped to the file size
if (ctl->stop >= len)
ctl->stop = len - 1;
ctl->pid = fork();
if (ctl->pid == 0)
exit(cmp(ctl));
if (ctl->pid < 0) {
perror(NULL);
exit(EXIT_FAILURE);
}
}
while (1) {
term = wait(&status);
if (term < 0)
break;
for (i = 0; i < num; i++) {
ctl = &pidlist[i];
if (term == ctl->pid) {
if (WIFSIGNALED(status)) {
printf("PID-> %d START %lld STOP %lld -- signal %d\n",
ctl->pid, ctl->start, ctl->stop, WTERMSIG(status));
retcode = 2;
}
if (WIFEXITED(status)) {
int code = WEXITSTATUS(status);
printf("PID-> %d START %lld STOP %lld -- code %d -- %s\n",
ctl->pid, ctl->start, ctl->stop,
code,code ? "FAIL" : "OK");
if (code)
retcode = 1;
}
continue;
}
}
}
exit(retcode);
}
int
cmp(pidctl_t *ctl)
{
int fd1 = -1;
char buf1[CHUNKSIZE];
int fd2 = -1;
char buf2[CHUNKSIZE];
off_t pos;
off_t remain;
int rlen1;
int rlen2;
int xlen;
int code = 0;
do {
if (ctl->start > ctl->stop)
break;
fd1 = open(file1,O_RDONLY);
if (fd1 < 0) {
code = 2;
break;
}
pos = lseek(fd1, ctl->start, SEEK_SET);
if (pos != ctl->start) {
code = 3;
break;
}
fd2 = open(file2,O_RDONLY);
if (fd2 < 0) {
code = 2;
break;
}
pos = lseek(fd2, ctl->start, SEEK_SET);
if (pos != ctl->start) {
code = 3;
break;
}
remain = (ctl->stop - ctl->start) + 1;
for (; remain > 0; remain -= xlen)
xlen = CHUNKSIZE;
if (xlen > remain)
xlen = remain;
rlen1 = read(fd1, buf1, xlen);
if (rlen1 != xlen) {
code = 4;
break;
}
rlen2 = read(fd2, buf2, xlen);
if (rlen2 != xlen) {
code = 5;
break;
}
if (memcmp(buf1,buf2,xlen) != 0) {
code = 1;
break;
}
}
} while (0);
if (fd1 >= 0)
close(fd1);
if (fd2 >= 0)
close(fd2);
return code;
}
Upvotes: 1
Reputation: 6405
Without the code that assigns fd1 and fd2, it is a guessing, but I think your cmp
function blindly assumes that fsync
, lseek
, and read
will work, without checking their return values. If either one fails(for example, because the file is locked), the respective buffer will not match of course, and that's what it reports back.
Note that making cmp
recursive it very inefficient and completely superfluous. It would work, but it produces a mile long stack and even might crash you program for larger files. A simple while or for would suffice.
Upvotes: 1