Daffy
Daffy

Reputation: 851

Allocate file without zeroing and without creating a sparse file in linux

My goal is to instantly allocate a lot of space to a file without making a sparse file. Reading from the file should output the garbage left in free space, rather than 0s.

Both truncate and fallocate make sparse files.

Is this possible?

Upvotes: 1

Views: 2469

Answers (1)

Alexis Wilke
Alexis Wilke

Reputation: 20731

Can We Avoid the Zeroes?

No. It is not possible.

The kernel, for obvious security reasons, will clear the sectors that get released on a delete, truncate, etc. So when you allocate a new file, it is automatically all zeroes. That clearing may be virtual (opposed to physically writing zeroes on disk, especially because it wouldn't work on an SSD--see shred(1) for details).

The only way if you want really extremely fast allocation is for you to create your own partition and manage it yourself. Not an easy feat if you currently rely on the many features of ext4 or some other similar file system.

Since the sectors should already be set to zeroes, there should not be any impact in term of speed when allocating a new (large) file on disk.

Sparse Files

From experience, when you write zeroes to the file, it physically writes the zeroes to disk. It does not create a sparse file at all.

In software, creating a sparse file requires you to use the truncate()/ftruncate() functions to enlarge files and lseek() past the end of the file before the next write(). However, if you do a write() of all zeroes, the OS does not try to transform those in a sparse file.

In other words, you could write something like this in C++ and you will not get a sparse file:

int fd = open(filename, O_CREAT | O_WRONLY, 0600);
std::vector<uint8_t> buffer(size);
write(fd, buffer.data(), buffer.size());
close(fd);

This code sample assumes a relatively small size parameter. Otherwise using a loop will be much more efficient and less likely to blow up your memory.

In your console, that translates in using a tool which will write each byte to the destination file. It's going to be slow for very large files (i.e. writing 1Tb... you know...) Here is one that works that way:

head -c${SIZE} /dev/zero >"${OUTPUT}"

Note that some tools on purpose support sparse files. For example:

  • cp can be used to copy sparse files.
  • dd will do the work of finding zeroes in the input file and properly truncate() to grow the output without writing the zeroes.
  • Etc.

There are of course very good reasons for allocating a physical file on disk even if that operation is slow:

  • You are creating a database file; it would be really dangerous to use sparse files in this case (i.e. a write could fail at the wrong time) and allocating new inode is slow so your database throughput could be affected (although that only happens on a write and growing your database file is similar to allocating an inode to your file when necessary).
  • You are creating a virtual disk; I tested those with sparse file and it's just too terrible; at least on my old computers with HDD, it was way too slow while running the VPS
  • You are creating a swap file; it's really not a good idea to use a sparse file for your swap (it's like looking for trouble on purpose! between the slowness to allocate the new blocks, the fact that the file is likely going to be fragmented, the possibility that the disk is full at the time you need that swap space...)

Upvotes: 2

Related Questions