pythonic
pythonic

Reputation: 21665

Cost of a page fault trap

I have an application which periodically (after each 1 or 2 seconds) takes checkpoints by forking itself. So checkpoint is a fork of the original process which just stays idle until it is asked to start when some error in the original process occurs.

Now my question is how costly is the copy-on-write mechanism of fork. How much is the cost of a page fault trap that will occur whenever the original process writes to a memory page (first time after taking a checkpoint that is), as copy-on-write mechanism will make sure that it gives the original process a different physical page than the checkpoint.

In my opinion, the page fault trap overhead could be quite high as an interrupt occurs, we land from user-space land to the kernel space land and then back from kernel to user-space. How many CPU cycles can I lose from such a a page fault trap. Assume that the RAM is big enough and we don't ever need to swap to the hard disk.

Well I know that its difficult to imagine a checkpointing scheme more efficient than this and therefore you could say why I'm worrying about page trap fault overhead, but I'm asking just to have an idea how much cost will be there for this scheme.

Upvotes: 14

Views: 4479

Answers (1)

Damon
Damon

Reputation: 70206

You can do the rough math for an educated guess yourself. Assuming no disk access (~10 billion cycles), you have to account for

  • 160 cycles for the trap and returning (approximately, on x86_64)
  • validity checks, quota, accounting, and whatnot (unknown, probably a few hundred to a thousand cycles)
  • aligned memcpy of 4096 bytes, something around 500-800 cycles
  • TLB invalidation (adds 10-100 cycles on first access)
  • either eviction of other cached data or one guaranteed cache miss (80-400 cycles) depending on the implementation of the memcpy. It matters a lot on your access pattern whether one or the other is better.

So all in all, we're talking of something around 2000 cycles, with some of the effects (e.g. TLB and cache effects) being spread out and not immediately visible. Omondi and Sedukhin reported 1700 cycles on P-III back in 2003, which is consistent with this estimate.

Note that if the page has never been written to before, things are slightly different according to a comment by L. Torvalds back in 2000. A copy-on-write miss on a zero page pulls another zero page from the pool and doesn't copy zeroes. That's pretty much a guaranteed cache miss too, though.

Upvotes: 19

Related Questions