Shantanu Mhapankar
Shantanu Mhapankar

Reputation: 83

How to optimize binary file(over 1MB) read in C?

I need to read two 1MB+ binary files byte by byte, compare them - If they're not equal, print out the next 16 bytes starting at the unequal byte. The requirement is that it all runs in just 5msecs. Currently, my program is taking 19msecs if the unequal bit is at the end of the two files. Are there any suggestions as to how I can optimize it?

#include <stdio.h>  //printf
#include <unistd.h> //file open
#include <fcntl.h>  //file read
#include <stdlib.h> //exit()
#include <time.h>   //clock

#define SIZE 4096

void compare_binary(int fd1, int fd2)
{   
    int cmpflag = 0;
    int errorbytes = 1;
    char c1[SIZE], c2[SIZE];
    int numberofbytesread = 1;

    while(read(fd1, &c1, SIZE) == SIZE && read(fd2, &c2, SIZE) == SIZE && errorbytes < 17){
        for (int i=0 ; i < SIZE ; i++) {
            if (c1[i] != c2[i] && cmpflag == 0){
                printf("Bytes not matching at offset %d\n",numberofbytesread);
                cmpflag = 1;
            }
            if (cmpflag == 1){
                printf("Byte Output %d: 0x%02x 0x%02x\n", errorbytes, c1[i], c2[i]);
                errorbytes++;
            }
            if (errorbytes > 16){
                break;
            }
            numberofbytesread++;
        }
    }
}

int main(int argc, char *argv[])
{
    int fd[2];
    if (argc < 3){
        printf("Check the number of arguments passed.\n");
        printf("Usage: ./compare_binary <binaryfile1> <binaryfile2>\n");
        exit(0);
    }
    if (!((access(argv[1], F_OK) == 0) && (access(argv[2], F_OK) == 0))){
        printf("Please check if the files passed in the argument exist.\n");
        exit(0);
    }

    fd[0] = open(argv[1], O_RDONLY);
    fd[1] = open(argv[2], O_RDONLY);

    if (fd[0]< 0 && fd[1] < 0){
        printf("Can't open file.\n");
        exit(0);
    }
    clock_t t;
    t = clock();
    compare_binary(fd[0], fd[1]);
    t = clock() - t;
    double time_taken = ((double)t)/(CLOCKS_PER_SEC/1000);
    printf("compare_binary took %f milliseconds to execute \n", time_taken);
}

Basically need the optimized way to read binary files over 1MB such that they can be done under 5msecs.

Upvotes: 3

Views: 569

Answers (2)

nima moradi
nima moradi

Reputation: 2628

first thing caught my eye is this

 if (cmpflag == 1){
            printf("Byte Output %d: 0x%02x 0x%02x\n", errorbytes, c1[i], c2[i]);
            errorbytes++;
        }
        if (errorbytes > 16){
            break;
        }

yourcmpflag checking is useless maybe this thing do a little optimaztion

  if (c1[i] != c2[i] && cmpflag == 0){
            printf("Bytes not matching at offset %d\n",numberofbytesread);
            printf("Byte Output %d: 0x%02x 0x%02x\n", errorbytes, c1[i], c2[i]);
            errorbytes++;
            if (errorbytes > 16){
                break;
            }
        }

you can do array compare built in function, or increase your buffer too

Upvotes: 0

PMF
PMF

Reputation: 17288

First, try reading larger blocks. There's no point in performing so many read calls when you can read everything at once. Using 2 MB of memory is not a deal nowadays. Disk I/O calls are inherently expensive, their overhead is significant too, but can be reduced.

Second, try comparing integers (or even 64-bit longs) instead of bytes in each iteration, that reduces the number of loops you need to do significantly. Once you find a missmatch, you can still switch to the byte-per-byte implementation. (of course, some extra trickery is required if the file length is not a multiple of 4 or 8).

Upvotes: 3

Related Questions