Reputation: 3635
This is a common interview question. You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time. Any help appreciated.
Upvotes: 10
Views: 3734
Reputation: 178531
For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n)
sorting.
Proof: Assume such DS exist, let it be D
.
Let A
be input array for sorting. (Assume A.size()
is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n)
.
Proof: Since we have constant operations of add() and median(), each of them is O(1)
per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1
, the median is the smallest element in A
. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n)
, and not possible under comparisons model, such DS does not exist.
QED.
Upvotes: 2
Reputation: 71019
The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored
) and you know the number of elements to the left of the median(size/2
). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.
Upvotes: 3
Reputation: 159260
You have already done all the heavy lifting, by building the store
counters. Together with the size
value, it's easy enough.
You simply start iterating the store
, summing up the counts until you reach half of size
. That is your median value, if size
is odd. For even size
, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n
, i.e. performance is unchanged even if n
reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.
Upvotes: 10