Masterminder
Masterminder

Reputation: 1135

Merging two lists efficiently with limited bound

I am trying to merge two arrays/lists where each element of the array has to be compared. If there is an identical element in both of them I increase their total occurrence by one. The arrays are both 2D, where each element has a counter for its occurrence. I know both of these arrays can be compared with a double for loop in O(n^2), however I am limited by a bound of O(nlogn). The final array will have all of the elements from both lists with their increased counters if there are more than one occurrence

Array A[][] = [[8,1],[5,1]]
Array B[][] = [[2,1],[8,1]]

After the merge is complete I should get an array like so

Array C[][] = [[2,1],[8,2],[8,2],[5,1]]

The arrangement of the elements does not have to be necessary.

From readings, Mergesort takes O(nlogn) to merge two lists however I am currently at a roadblock with my bound problem. Any pseudo code visual would be appreciated.

Upvotes: 2

Views: 714

Answers (4)

sehe
sehe

Reputation: 393457

You could write an algorithm to merge them by walking both sequences sequentially in order, inserting where appropriate.

I've chosen a (seemingly more apt) datastructure here: std::map<Value, Occurence>:

#include <map>
using namespace std;

using Value     = int;
using Occurence = unsigned;
using Histo     = map<Value, Occurence>;

If you insist on contiguous storage, boost::flat_map<> should be your friend here (and a drop-in replacement).

The algorithm (tested with your inputs, read comments for explanation):

void MergeInto(Histo& target, Histo const& other)
{
    auto left_it  = begin(target), left_end  = end(target);
    auto right_it = begin(other),  right_end = end(other);
    auto const& cmp = target.value_comp();

    while (right_it != right_end)
    {
        if ((left_it == left_end) || cmp(*right_it, *left_it))
        {
            // insert at left_it
            target.insert(left_it, *right_it);
            ++right_it; // and carry on
        } else if (cmp(*left_it, *right_it))
        {
            ++left_it; // keep left_it first, so increment it
        } else
        {
            // keys match!
            left_it->second += right_it->second;
            ++left_it;
            ++right_it;
        }
    }
}

It's really quite straight-forward!

A test program: See it Live On Coliru

#include <iostream>

// for debug output
static inline std::ostream& operator<<(std::ostream& os, Histo::value_type const& v) { return os << "{" << v.first << "," << v.second << "}"; }
static inline std::ostream& operator<<(std::ostream& os, Histo const& v) { for (auto& el : v) os << el << " "; return os; }
//

int main(int argc, char *argv[])
{
    Histo A { { 8, 1 }, { 5, 1 } };
    Histo B { { 2, 1 }, { 8, 1 } };

    std::cout << "A: " << A << "\n";
    std::cout << "B: " << B << "\n";

    MergeInto(A, B);
    std::cout << "merged: " << A << "\n";
}

Printing:

A: {5,1} {8,1} 
B: {2,1} {8,1} 
merged: {2,1} {5,1} {8,2} 

You could shuffle the interface a tiny bit in case you really wanted to merge into a new object (C):

// convenience
Histo Merge(Histo const& left, Histo const& right)
{
    auto copy(left);
    MergeInto(copy, right);
    return copy;
}

Now you can just write

Histo A { { 8, 1 }, { 5, 1 } };
Histo B { { 2, 1 }, { 8, 1 } };
auto C = Merge(A, B);

See that Live on Coliru, too

Upvotes: 0

Iuri Covalisin
Iuri Covalisin

Reputation: 645

Here's my algorithm based on bucket counting

time complexity: O(n)

memory complexity: O(max), where max is the maximum element in the arrays

Output: [8,2][5,1][2,1][8,2]

Code:

#include <iostream>
#include <vector>
#include <iterator>

int &refreshCount(std::vector<int> &counters, int in) {
    if((counters.size() - 1) < in) {
        counters.resize(in + 1);
    }
    return ++counters[in];
}

void copyWithCounts(std::vector<std::pair<int, int> >::iterator it,
                    std::vector<std::pair<int, int> >::iterator end,
                    std::vector<int> &counters,
                    std::vector<std::pair<int, int&> > &result
                    ) {
    while(it != end) {
        int &count = refreshCount(counters, (*it).first);
        std::pair<int, int&> element((*it).first, count);
        result.push_back(element);
        ++it;
    }
}

void countingMerge(std::vector<std::pair<int, int> > &array1,
                   std::vector<std::pair<int, int> > &array2,
                   std::vector<std::pair<int, int&> > &result) {
    auto array1It = array1.begin();
    auto array1End = array1.end();
    auto array2It = array2.begin();
    auto array2End = array2.end();

    std::vector<int> counters = {0};

    copyWithCounts(array1It, array1End, counters, result);
    copyWithCounts(array2It, array2End, counters, result);
}

int main()
{
    std::vector<std::pair<int, int> > array1 = {{8, 1}, {5, 1}};
    std::vector<std::pair<int, int> > array2 = {{2, 1}, {8, 1}};

    std::vector<std::pair<int, int&> > result;
    countingMerge(array1, array2, result);

    for(auto it = result.begin(); it != result.end(); ++it) {
        std::cout << "[" << (*it).first << "," << (*it).second << "] ";
    }

    return 0;
}

Short explanation: because you mentioned, that final arrangement is not necessary, I did simple merge (without sort, who asked sort?) with counting, where result contains reference to counters, so no need to walk through the array to update the counters.

Upvotes: 0

A simple algorithm that requires twice as much memory would be to order both inputs (O(n log n)) and then sequentially pick the elements from the head of both lists and do the merge (O(n)). The overall cost would be O(n log n) with O(n) extra memory (additional size of the smallest of both inputs)

Upvotes: 0

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153955

I quite like Stepanov's Efficient Programming although they are rather slow. In sessions 6 and 7 (if I recall correctly) he discusses the algorithms add_to_counter() and reduce_counter(). Both algorithms are entirely trivial, of course, but can be used to implement a non-recursive merge-sort without too much effort. The only possibly non-obvious insight is that the combining operation can reduce the two elements into a sequence rather than just one element. To do the operations in-place you'd actually store iterators (i.e., pointers in case of arrays) using a suitable class to represent a partial view of an array.

I haven't watched the sessions beyond session 7 (and actually not even the complete session 7, yet) but I would fully expect that he actually presents how to use the counter produced in session 7 to implement, e.g., merge-sort. Of course, the run-time complexity of merge-sort is O(n ln n) and, when using the counter approach it will use O(ln n) auxiliary space.

Upvotes: 2

Related Questions