srijeet
srijeet

Reputation: 111

Reading a large file using C (greater than 4GB) using read function, causing problems

I have to write C code for reading large files. The code is below:

int read_from_file_open(char *filename,long size)
{
    long read1=0;
    int result=1;
    int fd;
    int check=0;
    long *buffer=(long*) malloc(size * sizeof(int));
    fd = open(filename, O_RDONLY|O_LARGEFILE);
    if (fd == -1)
    {
       printf("\nFile Open Unsuccessful\n");
       exit (0);;
    }
    long chunk=0;
    lseek(fd,0,SEEK_SET);
    printf("\nCurrent Position%d\n",lseek(fd,size,SEEK_SET));
    while ( chunk < size )
    {
        printf ("the size of chunk read is  %d\n",chunk);
        if ( read(fd,buffer,1048576) == -1 )
        {
            result=0;
        }
        if (result == 0)
        {
            printf("\nRead Unsuccessful\n");
            close(fd);
            return(result);
        }

        chunk=chunk+1048576;
        lseek(fd,chunk,SEEK_SET);
        free(buffer);
    }

    printf("\nRead Successful\n");

    close(fd);
    return(result);
}

The issue I am facing here is that as long as the argument passed (size parameter) is less than 264000000 bytes, it seems to be able to read. I am getting the increasing sizes of the chunk variable with each cycle.

When I pass 264000000 bytes or more, the read fails, i.e.: according to the check used read returns -1.

Can anyone point me to why this is happening? I am compiling using cc in normal mode, not using DD64.

Upvotes: 11

Views: 28238

Answers (3)

Senna
Senna

Reputation: 378

In the first place, why do you need lseek() in your cycle? read() will advance the cursor in the file by the number of bytes read.

And, to the topic: long, and, respectively, chunk, have a maximum value of 2147483647, any number greater than that will actually become negative.

You want to use off_t to declare chunk: off_t chunk, and size as size_t. That's the main reason why lseek() fails.

And, then again, as other people have noticed, you do not want to free() your buffer inside the cycle.

Note also that you will overwrite the data you have already read. Additionally, read() will not necessarily read as much as you have asked it to, so it is better to advance chunk by the amount of the bytes actually read, rather than amount of bytes you want to read.

Taking everything in regards, the correct code should probably look something like this:

// Edited: note comments after the code
#ifndef O_LARGEFILE
#define O_LARGEFILE 0
#endif

int read_from_file_open(char *filename,size_t size)
{
int fd;
long *buffer=(long*) malloc(size * sizeof(long));
fd = open(filename, O_RDONLY|O_LARGEFILE);
   if (fd == -1)
    {
       printf("\nFile Open Unsuccessful\n");
       exit (0);;
    }
off_t chunk=0;
lseek(fd,0,SEEK_SET);
printf("\nCurrent Position%d\n",lseek(fd,size,SEEK_SET));
while ( chunk < size )
  {
   printf ("the size of chunk read is  %d\n",chunk);
   size_t readnow;
   readnow=read(fd,((char *)buffer)+chunk,1048576);
   if (readnow < 0 )
     {
        printf("\nRead Unsuccessful\n");
        free (buffer);
        close (fd);
        return 0;
     }

   chunk=chunk+readnow;
  }

printf("\nRead Successful\n");

free(buffer);
close(fd);
return 1;

}

I also took the liberty of removing result variable and all related logic since, I believe, it can be simplified.

Edit: I have noted that some systems (most notably, BSD) do not have O_LARGEFILE, since it is not needed there. So, I have added an #ifdef in the beginning, which would make the code more portable.

Upvotes: 14

rashok
rashok

Reputation: 13504

If its 32 bit machine, it will cause some problem for reading a file of larger than 4gb. So if you are using gcc compiler try to use the macro -D_LARGEFILE_SOURCE=1 and -D_FILE_OFFSET_BITS=64.

Please check this link also

If you are using any other compiler check for similar types of compiler option.

Upvotes: 0

Jay
Jay

Reputation: 24915

The lseek function may have difficulty in supporting big file sizes. Try using lseek64

Please check the link to see the associated macros which needs to be defined when you use lseek64 function.

Upvotes: 3

Related Questions