Reputation: 974
This is a question from the book 'Database Systems The complete book, 2nd Edition' - Chapter 15: Two-pass algorithm based on sorting. "Sometimes, it is possible to save some disk I/O ’s if we leave the last sublist in memory. It may even make sense to use sublists of fewer than blocks to take advantage of this effect. How many disk I/O ’s can be saved this way?"
I figured out that you divide the original relation into sub-lists and and sort them in the first pass, and keep the last list in memory, which will occupy fewer than M-1 block. Then how do you progress with sorting?
Upvotes: 3
Views: 1300
Reputation: 5295
This is just a guess, but I suspect the answer can be described as follows. Standard "level-at-a-time" merge sorting looks like this:
1 1 1 1 1 1 1 1
--- --- --- --- -- pass 1
2 2 2 2
----- ----- -- pass 2
4 4
--------- -- pass 3
8
Note here that we perform a complete pass of the input data before moving on to the next level.
An alternative is "subtree-at-a-time" merge sorting, which looks like this:
1 1 1 1 1 1 1 1
--- | | | | | |
2 --- | | | |
| 2 | | | |
----- | | | |
4 --- | |
| 2 ---
| | 2
| -----
| 4
---------
8
Here we are merging each subtree with its neighbour of the same height as soon as that neighbour has been constructed. We do the same amount of work, but locality is improved.
Cheers.
Upvotes: 1