Reputation: 373
I created a program that finds the median of a list of numbers. The list of numbers is dynamic in that numbers can be removed and inserted (duplicate numbers can be entered) and during this time, the new median is re-evaluated and printed out.
I created this program using a multimap because
1) the benefit of it being already being sorted,
2) easy insertion, deletion, searching (since multimap implements binary search)
3) duplicate entries are allowed.
The constraints for the number of entries + deletions (represented as N) are: 0 < N <= 100,000.
The program I wrote works and prints out the correct median, but it isn't fast enough. I know that the unsorted_multimap is faster than multimap, but then the problem with unsorted_multimap is that I would have to sort it. I have to sort it because to find the median you need to have a sorted list. So my question is, would it be practical to use an unsorted_multimap and then quick sort the entries, or would that just be ridiculous? Would it be faster to just use a vector, quicksort the vector, and use a binary search? Or maybe I am forgetting some fabulous solution out there that I haven't even thought of.
Though I'm not new to C++, I will admit, that my skills with time-complexity are somewhat medicore.
The more I look at my own question, the more I'm beginning to think that just using a vector with quicksort and binary search would be better since the data structures basically already implement vectors.
Upvotes: 4
Views: 4627
Reputation: 9199
the more I look at my own question, the more I'm beginning to think that just using vector with quicksort and binary search would be better since the data structures basically already implement vectors.
If you have only few updates - use unsorted std::vector + std::nth_element algorithm which is O(N). You don't need full sorting which is O(N*ln(N)).
#include <algorithm>
#include <iterator>
#include <iostream>
#include <ostream>
#include <vector>
using namespace std;
template<typename RandomAccessIterator>
RandomAccessIterator median(RandomAccessIterator first,RandomAccessIterator last)
{
RandomAccessIterator m = first + distance(first,last)/2; // handle even middle if needed
nth_element(first,m,last);
return m;
}
int main()
{
vector<int> values = {5,1,2,4,3};
cout << *median(begin(values),end(values)) << endl;
}
Output is:
3
If you have many updates and only removing from middle - use two heaps as comocomocomocomo suggests. If you would use fibonacci_heap - then you would also get O(N) removing from arbitary position (if don't have handle to it).
If you have many updates and need O(ln(N)) removing from arbitary places - then use two multisets as ipc suggests.
Upvotes: 5
Reputation: 3023
Can any one help me what is Space and Time complexity of my following C# program with details.
//Passing Integer array to Find Extreme from that Integer Array
public int extreme(int[] A)
{
int N = A.Length;
if (N == 0)
{
return -1;
}
else
{
int average = CalculateAverage(A);
return FindExtremes(A, average);
}
}
// Calaculate Average of integerArray
private int CalculateAverage(int[] integerArray)
{
int sum = 0;
foreach (int value in integerArray)
{
sum += value;
}
return Convert.ToInt32(sum / integerArray.Length);
}
//Find Extreme from that Integer Array
private int FindExtremes(int[] integerArray, int average) {
int Index = -1; int ExtremeElement = integerArray[0];
for (int i = 0; i < integerArray.Length; i++)
{
int absolute = Math.Abs(integerArray[i] - average);
if (absolute > ExtremeElement)
{
ExtremeElement = integerArray[i];
Index = i;
}
}
return Index;
}
Upvotes: 2
Reputation: 4942
If your purpose is to keep track of the median on the fly, as elements are inserted/removed, you should use a min-heap and a max-heap. Each one would contain one half of the elements... There was a related question a couple of days ago: How to implement a Median-heap
Though, if you need to search for specific values in order to remove elements, you still need some kind of map.
You said that it is slow. Are you iterating from the beginning of the map to the (N/2)'th element every time you need the median? You don't need to. You can keep track of the median by maintaining an iterator pointing to it at all times and a counter of the number of elements less than that one. Every time you insert/remove, compare the new/old element with the median and update both iterator and counter.
Another way of seeing it is as two multimaps containing half the elements each. One holds the elements less than the median (or equal) and the other holds those greater. The heaps do this more efficiently, but they don't support searches.
If you only need the median a few times you can use the "select" algorithm. It is described in Sedgewick's book. It takes O(n) time on average. It is similar to quick sort but it does not sort completely. It just partitions the array with random pivots until, eventually, it gets to "select" on one side the smaller m elements (m=(n+1)/2). Then you search for the greatest of those m elements, and this is the median.
Upvotes: 4
Reputation: 8143
Here is how you could implement that in O(log N)
per update:
template <typename T>
class median_set {
public:
std::multiset<T> below, above;
// O(log N)
void rebalance()
{
int diff = above.size() - below.size();
if (diff > 0) {
below.insert(*above.begin());
above.erase(above.begin());
} else if (diff < -1) {
above.insert(*below.rbegin());
below.erase(below.find(*below.rbegin()));
}
}
public:
// O(1)
bool empty() const { return below.empty() && above.empty(); }
// O(1)
T const& median() const
{
assert(!empty());
return *below.rbegin();
}
// O(log N)
void insert(T const& value)
{
if (!empty() && value > median())
above.insert(value);
else
below.insert(value);
rebalance();
}
// O(log N)
void erase(T const& value)
{
if (value > median())
above.erase(above.find(value));
else
below.erase(below.find(value));
rebalance();
}
};
The idea is the following:
You can't use priority_queue
s because they won't let you remove one item.
Upvotes: 2
Reputation: 1243
You are almost certainly better off using a vector. Possibly maintaining an auxiliary vector of indexes to be removed between median calculations so you can delete them in batches. New additions can also be put into an auxiliary vector, sorted, then merged in.
Upvotes: 1