Jihyun
Jihyun

Reputation: 1105

cannot write(2) file larger than 2GB (up to 2TB)

I have a program written by C. It computes something and writes the output into a file. My problem is that it does not write more than 2GB. Let me put a simplified code.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <malloc.h>
#include <errno.h>
int main() {
    size_t size = 3221225472LL;
    char *map = malloc(size);
    size_t allocated = malloc_usable_size(map);
    int fd = open("myfile", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0644);
    ssize_t written = write(fd, map, size);
    return 0;
}

Although the output file "myfile" is being created, the size is always 2GB (2147479552 Bytes) for whatever size greater than 2GB I requested. The malloc() successfully allocated memory of the requested size (in this case, "allocated" is 3GB). The errno after write() is 0.

The environment is the following

Compilation:

gcc code.c -D_FILE_OFFSET_BITS=64 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE

What could be the reason for this?

Addition:

After getting two responses, I added retry code as following:

int main() {
    size_t size = 3221225472LL;
    char *map = malloc(size);
    size_t allocated = malloc_usable_size(map);
    int fd = open("myfile", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0644);
    ssize_t written = write(fd, map, size);
    while (written < size) {
        written += write(fd, &map[written], size-written);
    }
    return 0;
}

Upvotes: 2

Views: 2264

Answers (2)

Some file systems have strong limitations on file size, notably FAT32. And both disk quotas and resource limits (see setrlimit(2) with RLIMIT_FSIZE) are limiting the file size (and of course the available space on the file system itself).

BTW, your written is very probably (after your call to write(2)...) only 231. You should check it.

And write(2) documents:

On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)

Of course, a given call to write(2) should never be expected to have written all the required bytes (and this is true on all POSIX systems, and was true on Unix systems of the 1980s). For example, a write to some pipe(7) surely won't be able to write that many bytes.

BTW, a huge single call to write(2) is probably (or at least could well be) less efficient than several of them with a smaller buffer. The optimal buffer size is implementation specific (and related also to the page cache and to the hardware) but might be several dozen of kilobytes, or at most a megabyte.

You may prefer to use <stdio.h> buffered fwrite(3) but you should check the returned count.

At last, you might consider using mmap(2) in your case. See also msync(2)

Notice that for large files, the real bottleneck is the hardware (the disk itself). So it does not matter much to use buffered fwrite for performance.

(you mention a terabyte file in a comment)

BTW, for large terabyte datasets, using some higher-level approach (notably a database, perhaps with sqlite, or an indexed file à la GDBM) could actually be more efficient, because you are then able to write only a portion of the data (or because the RDBMS runs on a remote database server, e.g. using MariaDB or PostGreSQL). YMMV. But the hardware bandwidth is less than a Gbyte/sec so writing a terabyte might take a few hours. And even with a huge swap size you won't be able to malloc a terabyte without thrashing on a 32 Gbyte machine.

You might also use cleverly posix_fadvise(2) to slightly improve performance (but not much: for terabyte files the bottleneck is the hardware)

Upvotes: 4

phuclv
phuclv

Reputation: 41952

According to the man page (emphasis mine)

On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)

Upvotes: 15

Related Questions