Reputation: 5659
Could someone explain what is meant by flushing to disk in the following context? If I am writing data to a log on a filesystem, doesn't this mean I am putting it on disk? At what point would/should you flush a file to disk?
This suggests a design which is very simple: rather than maintain as much as possible in-memory and flush it all out to the filesystem in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.
(from https://kafka.apache.org/documentation.html#design).
Upvotes: 10
Views: 16373
Reputation: 7
it means that all downloaded status goes to flushing to disk,() the file stays there forever and doesnt write the file to hd after the files are downloaded.
It's an issue relating to skipping files with the partfile enabled in the advanced preferences.
Either turn off the partfile or stop skipping files.
Upvotes: 0
Reputation: 295805
All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.
What this means is that Kafka hands data off to the kernel with write()
syscalls -- at which point in time it's visible to other processes but may or may not actually be reflected on disk and survive a reboot -- but doesn't force the kernel to rush it to disk with fsync()
calls or similar (as appropriate for the OS at hand). If optimizing for throughput and not needing to guarantee that content is retrievable, this can be an appropriate decision: fsync()
and its kin can be expensive calls (though by doing long contiguous writes that don't require seeking, kafka minimizes the expense of its disk IO).
Upvotes: 8