what's the exact performance cost of context switch within the same thread? (memory access -> page fault -> memory access again)
How many cpu cycles are used between (memory access -> page fault -> memory access again)?
If a thread still has time slice, does page fault result in scheduling?