Jamaico
Jamaico

Reputation: 95

Finding Median in Order-Statistics tree in O(1) time

I've been asked to create a data structure in which functions like inserting a node and finding a node with a certain key value take O(logn). I've been asked to find the median in O(1) time.

I've been thinking about using a order statistic tree, will selecting the node with a N/2 rank find the median?

I've seen a similar question here but I would like a better explanation :(Find median in O(1) in binary tree)

Any ideas?

Thanks.

Upvotes: 1

Views: 3007

Answers (2)

Ian Fitchett
Ian Fitchett

Reputation: 91

I believe what Enzo Nakamura is referring to when he says finding the kth position is O(log n) is the use of a red-black binary search tree with a certain twist. At each node you additionally store the size of that node's subtree. This way you can take advantage of the properties of a binary search tree to find which value is the kth value in O(log n). However, the root node of a binary search tree is only guaranteed to be the median if it is a) an odd number of nodes and b) a perfectly balanced tree, and red-black binary search trees are not necessarily perfectly balanced, so even the approach above does not guarantee this.

There is a sort of famous question called the "running median" where you start off with the median and then have to calculate the new median every time you are given a new element. You can solve this by making a heap whose root is the median, whose left subtree is a min heap and whose right subtree is a max heap. Inserting to this heap is O(log n) time but once the tree is made, median retreival is always O(1) because the structure guarantees the root node is the median in all cases.

I think it should be possible to generalize this problem to find the kth value in constant time, as long as k does not change. In the running median problem, we need to rebalance the left and right subtrees every time one is 2 nodes larger than the other. We could generalize this problem to say we rebalance the left subtree (if it's the kth-smallest) or right subtree (if it's the kth-greatest) every time it's size exceeds k, with insert time and rebalance time still both O(log n). Now we can't use this approach to find any kth value like we can using the RB tree, but for the one specific k value, you are retreiving its value in constant time. This process is also referenced in this paper, which specifically states the ability to retrieve the kth value in constant time.

Upvotes: 3

Enzo Nakamura
Enzo Nakamura

Reputation: 135

Yes, the element on the ceil(n/2) position (starting at 1) is indeed the median (assuming the size of the sequence is odd). However, using an order-statistics tree to find the element on some k-th position is O(lg n), not O(1).

What you should know is: given a perfect binary search tree (a binary search tree in which all interior nodes have two children and all leaves have the same depth or same level), the median is exactly the root. More generally, if you have a binary search tree structure in which each node also stores the number of nodes in its subtree and it happens that both the left and right child of the root have that same number, then the root is the median. This situation is quite limited, so an idea would be balancing the binary search tree to increase the probability of the root to be the median.

Another idea is: keep a binary search tree and an extra variable (pointer) indicating where the median is now. Then, for every insert/delete operation, update this pointer.

Hope that helps!

Upvotes: 2

Related Questions