Shaurya Chaudhuri
Shaurya Chaudhuri

Reputation: 3842

Merging K- Sorted Lists using Priority Queue

I have been asked in my algorithm class to make a K-way merge algorithm which is of O(nlogk) After searching i found it could be done via making a k length priority queue and enqueuing it with the first element of each list. Extract the minimum, append it to result and enqueue from the list whose element has been extracted. I am confused about:

  1. How will it know when a particular list is exhausted, suppose a list has smaller elements than every other element in other lists?

  2. How will it know the element was of which list (if a structue is not used to define)?

  3. How is the time complexity O(nlogk)?

EDIT:

It would be a bit more helpful if someone can write down the algorithm step-wise, because all i have read it is in sentences and its hard to understand the way it is, if someone could write down the algorithm might be helpful to understand.

Upvotes: 3

Views: 8395

Answers (7)

Argus Malware
Argus Malware

Reputation: 787

here is my code using c++ stl

#include<iostream>
#include<vector>
#include<climits>
#include<queue>
#define ROWS 4
#define COLS 8
using namespace std;

struct node
{
    int ele;
    int arr_no;
    int next_index; 
};

void printVector(vector<int> v)
{
    for(unsigned int i=0;i<v.size();i++)
        cout<<v[i]<<" ";
}

// THIS IS THE BASIS ON WHICH THE ELEMENTS OF A PRIORITY QUEUE ARE SORTED AND 
// KEPT IN THE QUEUE, HERE THE CRITERIA IS THAT THE NODE WITH SMALLER ELEMENT SHOULD
// COME ABOVE THE ONE WITH LARGER ELEMENT

class compare
{
    public:
        bool operator()(node& n1, node& n2)
        {
           if (n1.ele > n2.ele) 
                return true;
           else
                return false;
        }
};

vector<int> mergeKArrays(vector< vector<int> > v)
{
    int k = v.size();       // NUMBER OF LISTS
    int n = v.at(0).size(); //SIZE OF EACH LIST

    vector<int> result;
    //result.resize( n*k );

    priority_queue<node, vector<node>, compare> minHeap;
    for (int i = 0; i < k; i++)
    {
        node temp;
        temp.ele = v[i][0]; //STORE THE FIRST ELEMENT
        temp.arr_no = i;        //INDEX OF ARRAY
        temp.next_index = 1;    //INDEX OF NEXT ELEMENT TO BE STORED FROM ARRAY
        minHeap.push(temp);
    }

    // NOW ONE BY ONE GET THE MINIMUM ELEMENT FROM MIN
    // HEAP AND REPLACE IT WITH NEXT ELEMENT OF ITS ARRAY
    for (int count = 0; count < n*k; count++)
    {
        // GET THE MINIMUM ELEMENT AND STORE IT IN OUTPUT
        node min_ele_node = minHeap.top();
        minHeap.pop();      
        result.push_back(min_ele_node.ele);

        // FIND THE NEXT ELELEMENT THAT WILL REPLACE CURRENT
        // ROOT OF HEAP. THE NEXT ELEMENT BELONGS TO SAME
        // ARRAY AS THE CURRENT ROOT.
        node new_node;
        new_node.arr_no = min_ele_node.arr_no;
        if (min_ele_node.next_index < n)
        {
            new_node.ele = v.at(min_ele_node.arr_no)[min_ele_node.next_index];
            new_node.next_index = min_ele_node.next_index + 1;
        }
        // IF ROOT WAS THE LAST ELEMENT OF ITS ARRAY
        else 
        {
            new_node.ele =  INT_MAX; //INT_MAX IS FOR INFINITE
        }

        // REPLACE ROOT WITH NEXT ELEMENT OF ARRAY
        minHeap.push(new_node);
    }
    return result;
}


int main()
{
    int arr[ROWS][COLS] = 
                    { 
                        {10, 20, 30, 40, 50, 60, 71, 86},
                        {15, 25, 35, 45, 60, 69, 77, 78},
                        {27, 29, 37, 48, 50, 65, 75, 78},
                        {32, 33, 39, 50, 80, 133, 139, 150},
                    }; 

    vector< vector<int> > matrix ;

    for( int i=0 ; i < ROWS; i++)
    {
        vector<int> vec;
        for(int j=0; j < COLS; j++)
            vec.push_back(arr[i][j]);
        matrix.push_back(vec);
    }

    vector<int> result = mergeKArrays(matrix);
    printVector(result);
    return 0;
}

Upvotes: 0

Paul Hankin
Paul Hankin

Reputation: 58211

Here's some Python 2 code that does the merging.

import heapq

def addtoheap(h, i, it):
    try:
        heapq.heappush(h, (next(it), i))
    except StopIteration:
        pass

def mergek(*lists):
    its = map(iter, lists)
    h = []
    for i, it in enumerate(its):
        addtoheap(h, i, it)
    while h:
        v, i = heapq.heappop(h)
        addtoheap(h, i, its[i])
        yield v

for x in mergek([1, 3, 5], [2, 4, 6], [7, 8, 9], [10]):
    print x

Why is it O(n log k)? Well for each value removed, there's a heap pop and possibly a heap push (both of which are O(log k)). Since we remove n elements, it's O(n log k).

Upvotes: 7

yinchuandong
yinchuandong

Reputation: 41

Paul Hankin's solution is right, but it's a little bit difficult to read, especially you wanna implement in c++ or java. My solution is similar to Paul's. If you write in c++ or java, you may need an extra data structure to store the value of the element, index of the element in k-th array and index of the array in lists.

Element{
    int value;
    int idInArray,
    int idInList
}

But in python, I just store in a tuple (value, idInArray, idInList)

def mergeKArray(*lists):
    # implemented by min heap
    h = []
    r = []
    for k, arr in enumerate(lists):
        heapq.heappush(h, (arr[0], 0, k))
    while h:
        # min is the minimum element
        # i is the index of the min in the k-th array
        # k is the index of array in the list
        min, i, k = heapq.heappop(h)
        r.append(min)
        if i < len(lists[k]) - 1:
            i += 1
            heapq.heappush(h, (lists[k][i], i, k))
    return r

Because I just need to maintain a min heap with k elements, the time of eject or inject heap is O(log k). I also need to scan all n elements, and each element cost 2*log(k) time to inject and eject. Therefore, the big-O is O(n*log k).

Upvotes: 0

user2228512
user2228512

Reputation: 37

You should not here that "n" the total number of nodes over all the list not just one list. In this case the solution is O(n logk). If we mean that we have n node on an average in every list(total of k lists) then it will be O(nk logk)

Here is an in depth explanation with some code

Upvotes: 0

Jim Mischel
Jim Mischel

Reputation: 133975

I wrote a series of articles about this some years ago when discussing sorting a large text file. The idea is that the items you put on the heap contain not just the value but also the list the value came from. Or, you could just put the list reference on the heap and have the comparison function compare against the first item in the particular list.

See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=676 and http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=677 for an explanation of the basic algorithm using a sequential list in place of a heap. See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=680 for an improved version that uses a heap.

As I said in my comment, you can also do the merge without a heap. See https://stackoverflow.com/a/18984961/56778 for a description.

Upvotes: 2

digital_revenant
digital_revenant

Reputation: 3324

Instead of simply storing the first element of each list in the priority queue, wrap it in a structure like this;

struct wrapper
{
    int list_number;
    int element;
}

Then, when you are pushing an element onto the priority queue, just add the list number form where it came. This way, when the minimum element gets popped, you will know from which list you should push the next element to push on the queue by examining popped_element.list_number.

In order to find if your list is empty you should add a function empty to it that returns true if the list does not have any more elements and false otherwise. The function would be very easy to implement. Just check if the size is zero then the list is empty otherwise it has one or more elements.

From your question I assume that a binary heap is used to implement the priority queue. The insertion operation in a binary heap takes O(lg k) time and the extract-min operation also takes O(lg k) times where k is the the size of the heap (number of lists in your case). Now if the total number of elements you have is n, the total time to process all of them will be O(n lg k).

Upvotes: 2

Mateusz Dymczyk
Mateusz Dymczyk

Reputation: 15141

1 list is exhausted when it has no more elements

2 you need to keep track from which list did the min element com

3 for each element you put it into a min heap of size k which takes logk so you have n times logk

Upvotes: 0

Related Questions