jcm
jcm

Reputation: 5659

What does it mean to 'flush to disk'?

Could someone explain what is meant by flushing to disk in the following context? If I am writing data to a log on a filesystem, doesn't this mean I am putting it on disk? At what point would/should you flush a file to disk?

This suggests a design which is very simple: rather than maintain as much as possible in-memory and flush it all out to the filesystem in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.

(from https://kafka.apache.org/documentation.html#design).

Upvotes: 10

Views: 16373

Answers (2)

Zdol3232
Zdol3232

Reputation: 7

it means that all downloaded status goes to flushing to disk,() the file stays there forever and doesnt write the file to hd after the files are downloaded.

It's an issue relating to skipping files with the partfile enabled in the advanced preferences.

Either turn off the partfile or stop skipping files.

Upvotes: 0

Charles Duffy
Charles Duffy

Reputation: 295805

All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.

What this means is that Kafka hands data off to the kernel with write() syscalls -- at which point in time it's visible to other processes but may or may not actually be reflected on disk and survive a reboot -- but doesn't force the kernel to rush it to disk with fsync() calls or similar (as appropriate for the OS at hand). If optimizing for throughput and not needing to guarantee that content is retrievable, this can be an appropriate decision: fsync() and its kin can be expensive calls (though by doing long contiguous writes that don't require seeking, kafka minimizes the expense of its disk IO).

Upvotes: 8

Related Questions