Reputation: 83
I need to read two 1MB+ binary files byte by byte, compare them - If they're not equal, print out the next 16 bytes starting at the unequal byte. The requirement is that it all runs in just 5msecs. Currently, my program is taking 19msecs if the unequal bit is at the end of the two files. Are there any suggestions as to how I can optimize it?
#include <stdio.h> //printf
#include <unistd.h> //file open
#include <fcntl.h> //file read
#include <stdlib.h> //exit()
#include <time.h> //clock
#define SIZE 4096
void compare_binary(int fd1, int fd2)
{
int cmpflag = 0;
int errorbytes = 1;
char c1[SIZE], c2[SIZE];
int numberofbytesread = 1;
while(read(fd1, &c1, SIZE) == SIZE && read(fd2, &c2, SIZE) == SIZE && errorbytes < 17){
for (int i=0 ; i < SIZE ; i++) {
if (c1[i] != c2[i] && cmpflag == 0){
printf("Bytes not matching at offset %d\n",numberofbytesread);
cmpflag = 1;
}
if (cmpflag == 1){
printf("Byte Output %d: 0x%02x 0x%02x\n", errorbytes, c1[i], c2[i]);
errorbytes++;
}
if (errorbytes > 16){
break;
}
numberofbytesread++;
}
}
}
int main(int argc, char *argv[])
{
int fd[2];
if (argc < 3){
printf("Check the number of arguments passed.\n");
printf("Usage: ./compare_binary <binaryfile1> <binaryfile2>\n");
exit(0);
}
if (!((access(argv[1], F_OK) == 0) && (access(argv[2], F_OK) == 0))){
printf("Please check if the files passed in the argument exist.\n");
exit(0);
}
fd[0] = open(argv[1], O_RDONLY);
fd[1] = open(argv[2], O_RDONLY);
if (fd[0]< 0 && fd[1] < 0){
printf("Can't open file.\n");
exit(0);
}
clock_t t;
t = clock();
compare_binary(fd[0], fd[1]);
t = clock() - t;
double time_taken = ((double)t)/(CLOCKS_PER_SEC/1000);
printf("compare_binary took %f milliseconds to execute \n", time_taken);
}
Basically need the optimized way to read binary files over 1MB such that they can be done under 5msecs.
Upvotes: 3
Views: 569
Reputation: 2628
first thing caught my eye is this
if (cmpflag == 1){
printf("Byte Output %d: 0x%02x 0x%02x\n", errorbytes, c1[i], c2[i]);
errorbytes++;
}
if (errorbytes > 16){
break;
}
yourcmpflag
checking is useless maybe this thing do a little optimaztion
if (c1[i] != c2[i] && cmpflag == 0){
printf("Bytes not matching at offset %d\n",numberofbytesread);
printf("Byte Output %d: 0x%02x 0x%02x\n", errorbytes, c1[i], c2[i]);
errorbytes++;
if (errorbytes > 16){
break;
}
}
you can do array compare built in function, or increase your buffer too
Upvotes: 0
Reputation: 17288
First, try reading larger blocks. There's no point in performing so many read calls when you can read everything at once. Using 2 MB of memory is not a deal nowadays. Disk I/O calls are inherently expensive, their overhead is significant too, but can be reduced.
Second, try comparing integers (or even 64-bit longs) instead of bytes in each iteration, that reduces the number of loops you need to do significantly. Once you find a missmatch, you can still switch to the byte-per-byte implementation. (of course, some extra trickery is required if the file length is not a multiple of 4 or 8).
Upvotes: 3