Reputation: 414079
Please, provide code examples in a language of your choice.
Update: No constraints set on external storage.
Example: Integers are received/sent via network. There is a sufficient space on local disk for intermediate results.
Upvotes: 14
Views: 32503
Reputation: 477
Here's a valid and fun solution.
Load half the numbers into memory. Heap sort them in place and write the output to a file. Repeat for the other half. Use external sort (basically a merge sort that takes file i/o into account) to merge the two files.
Aside: Make heap sort faster in the face of slow external storage:
Start constructing the heap before all the integers are in memory.
Start putting the integers back into the output file while heap sort is still extracting elements
Upvotes: 0
Reputation: 5449
Upvotes: 0
Reputation: 414079
Sorting a million 32-bit integers in 2MB of RAM using Python by Guido van Rossum
Upvotes: 11
Reputation: 9764
No example, but Bucket Sort has relatively low complexity and is easy enough to implement
Upvotes: -3
Reputation: 414079
Dual tournament sort with polyphased merge
#!/usr/bin/env python
import random
from sort import Pickle, Polyphase
nrecords = 1000000
available_memory = 2000000 # number of bytes
#NOTE: it doesn't count memory required by Python interpreter
record_size = 24 # (20 + 4) number of bytes per element in a Python list
heap_size = available_memory / record_size
p = Polyphase(compare=lambda x,y: cmp(y, x), # descending order
file_maker=Pickle,
verbose=True,
heap_size=heap_size,
max_files=4 * (nrecords / heap_size + 1))
# put records
maxel = 1000000000
for _ in xrange(nrecords):
p.put(random.randrange(maxel))
# get sorted records
last = maxel
for n, el in enumerate(p.get_all()):
if el > last: # elements must be in descending order
print "not sorted %d: %d %d" % (n, el ,last)
break
last = el
assert nrecords == (n + 1) # check all records read
Upvotes: 1
Reputation: 61508
This wikipedia article on External Sorting have some useful information.
Upvotes: 3
Reputation: 4872
As people above mention type int of 32bit 4 MB.
To fit as much "Number" as possible into as little of space as possible using the types int, short and char in C++. You could be slick(but have odd dirty code) by doing several types of casting to stuff things everywhere.
Here it is off the edge of my seat.
anything that is less than 2^8(0 - 255) gets stored as a char (1 byte data type)
anything that is less than 2^16(256 - 65535) and > 2^8 gets stored as a short ( 2 byte data type)
The rest of the values would be put into int. ( 4 byte data type)
You would want to specify where the char section starts and ends, where the short section starts and ends, and where the int section starts and ends.
Upvotes: -2
Reputation: 24546
You need to provide more information. What extra storage is available? Where are you supposed to store the result?
Otherwise, the most general answer: 1. load the fist half of data into memory (2MB), sort it by any method, output it to file. 2. load the second half of data into memory (2MB), sort it by any method, keep it in memory. 3. use merge algorithm to merge the two sorted halves and output the complete sorted data set to a file.
Upvotes: 4
Reputation: 89055
Split the problem into pieces small enough to fit into available memory, then use merge sort to combine them.
Upvotes: 18
Reputation: 26830
1 million 32-bit integers = 4 MB of memory.
You should sort them using some algorithm that uses external storage. Mergesort, for example.
Upvotes: 4