Rockstar
Rockstar

Reputation: 13

How to sort big file without using the disk,network or virtual memory?

Yesterday, I attended an interview for network security position and this question was asked:

Let's say you have a PC with 1GB RAM and the disk of this computer holds a 10 GB file which contain random numbers. What technique will you use for sorting the file and propose algorithm. You cannot be using the disk or network or virtual memory for sorting?

I tried many different ways as I could think of, proposed External sorting but the interviewer said that this was not the right way. At the end of the interview I ask him politely what is the way and the algorithm for the question he ask me but he refused to say like it was some kind of big secret.

My question is how would anyone approach this kind of problem cause i just can't stop thinking about it but still no clear answer?

Upvotes: 1

Views: 590

Answers (1)

Luke
Luke

Reputation: 848

To sort the file you will need to run your algorithm inside the RAM. Because the file is 10x larger than the available amount of RAM, you will need to sort the file into 10 (or more) separate chunks, and recombine them at the end.

Your algorithm would need to consist of..

  1. Read through the list on Disc looking for the largest number in the file (and keep track of it after reading through the list)
  2. After reading through the entire list place this number into a list on the RAM and repeat this process (each time looking for the biggest number smaller than the last) until all or most of the RAM is used up.
  3. Now add that list back onto Disc with an index representing its order, in this case 1, and clear up the RAM for more processing
  4. Repeat step 3 10x or more each time building a list of sorted numbers on RAM, until all numbers have been added to a seperate list on Disc.
  5. To finish up, check the indexes on the beginning of each list and (one list at a time) place them into their correct order on the Disc

Update: I'm adding to my answer to reflect some points made by @JimMischel

The algorithm in RAM would not just be keeping track of the biggest number, but also maintaining a separate integer count that would be incremented with every subsequent occurrence of the number on the file. The number would then be placed on the sub-list in RAM however many times the number occurs.

Update: Concerning the asker's question

The question posted by the OP states that "you cannot use the disc for Sorting". The question says nothing that would imply disc cannot be used for storage. I believe most reading the question interpreted this the wrong way and thus, without having a place to store any data, presumed the assigned task to be impossible.

Upvotes: -2

Related Questions