Nimesh
Nimesh

Reputation:

Two-pass multi way merge sort?

If I have a relation (SQL) that does not fit in memory and I want to sort the relation using TPMMS (Two-pass multi-way merge sort method). How would I divide the table in sub tables (and how many) that can fit in memory and than merge them? Let's say I am using C#.

Upvotes: 4

Views: 7576

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 754530

I've not hunted down the current definition of a two-pass multi-way merge sort, but the theory of 'external sorting' (where the data is too big to fit in memory) is pretty much standard. Any decent book on algorithms will cover it; amongst many others, you could look at Knuth, Sedgewick, or (for the software archaeologists) Kernighan & Plauger Software Tools.

The basic technique is simple:

  1. Read data until there is no space left.
  2. Sort it.
  3. Write to a temporary file.
  4. Repeat from 1 until there is no data left to read.
  5. You know know how many temporary files you have, N.
  6. You need to determine how many of those files you can read at one time, M.
  7. If N > M, then you design your merging phase so that the last phase will merge M files.
  8. You merge sets of M files into new temporary files until you reach the last merge.
  9. You merge the final set of M files (or N if N < M) writing to final destination.

All dreadfully standard - but there are nit-picky details to get right.

There was a good article in AT&T's Unix System Readings Volume II called 'Theory and Practice in the Construction of a Working Sort Routine' that you should find and read if you are serious about learning how to handle external sorts. However, when you read it, remember that machines have changed dramatically since it was written, with gigabytes of main memory (instead of megabytes) and terabytes of disk space (or SSD — also instead of megabytes).

Upvotes: 9

Related Questions