Michael
Michael

Reputation: 42100

How to sort (million/billion/...) integers?

Sometimes interviewers ask how to sort million/billion 32-bit integers (e.g. here and here). I guess they expect the candidates to compare O(NLog(N)) sort with radix sort. For million integers O(NLog(N)) sort is probably better but for billion they are probably the same. Does it make sense ?

Upvotes: 19

Views: 37226

Answers (5)

Jiaji Li
Jiaji Li

Reputation: 1269

As aaaa bbbb said, it depends on the situation. You would ask questions about the project requirements. For example, if they want to count the ages of the employees, you probably use the Counting sort, I can sort the data in the memory. But when the data are totally random, you probably use the external sorting. For example, you can divide the data of the source file into the different files, every file has a unique range(File1 is from 0-1m, File2 is from 1m+1 - 2m , ect ), then you sort every single file, and lastly merge them into a new file.

Upvotes: 6

aaaa bbbb
aaaa bbbb

Reputation: 3043

If you get a question like this, they are not looking for the answer. What they are trying to do is see how you think through a problem. Do you jump right in, or do you ask questions about the project requirements?

One question you had better ask is, "How optimal of solution does the problem require?" Maybe a bubble sort of records stored in a file is good enough, but you have to ask. Ask questions about what if the input changes to 64 bit numbers, should the sort process be easily updated? Ask how long does the programmer have to develop the program.

Those types of questions show me that the candidate is wise enough to see there is more to the problem than just sorting numbers.

Upvotes: 47

Andrej Kirejeŭ
Andrej Kirejeŭ

Reputation: 5491

Use bit map. You need some 500 Mb to represent whole 32-bit integer range. For every integer in given array just set coresponding bit. Then simply scan your bit map from left to right and get your integer array sorted.

Upvotes: 5

zwol
zwol

Reputation: 140846

It depends on the data structure they're stored in. Radix sort beats N-log-N sort on fairly small problem sizes if the input is in a linked list, because it doesn't need to allocate any scratch memory, and if you can afford to allocate a scratch buffer the size of the input at the beginning of the sort, the same is true for arrays. It's really only the wrong choice (for integer keys) when you have very limited additional storage space and your input is in an array.

I would expect the crossover point to be well below a million regardless.

Upvotes: 1

The Archetypal Paul
The Archetypal Paul

Reputation: 41779

I expect they're looking for you to expand on the difference between internal sorting and external sorting. Apparently people don't read Knuth nowadays

Upvotes: 28

Related Questions