ZY Cao
ZY Cao

Reputation: 33

Why is the value you get from heapq.heappop(H) different from the value you get from H[0]?

I am trying to solve LeetCode problem 295. Find Median from Data Stream:

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far.
    Answers within 10-5 of the actual answer will be accepted.

Example 1:

[...]

MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

For this question, I am adding numbers to a heap and then later use the smallest one to do some operations.

When I tried to do the operation, I found out that my heap returns a different value when doing heapq.heappop(self.small) than when doing self.small[0].

Could you please explain this to me? Any hint is much appreciated.

(Every number in self.small is added using heapq.heappush)

Here is my code when it works:

class MedianFinder:
    def __init__(self): 
        self.small, self.large = [], [] 

    def addNum(self, num): 
        heapq.heappush(self.small, -1 * num) 
        if (self.small and self.large) and -1 * self.small[0] > self.large[0]: 
            val = -1 * heapq.heappop(self.small) 
            heapq.heappush(self.large, val) 
        if len(self.small) > len(self.large) + 1: 
            val = -1 * heapq.heappop(self.small) 
            heapq.heappush(self.large, val) 
        if len(self.large) > len(self.small) + 1: 
            val = -1 * heapq.heappop(self.large) 
            heapq.heappush(self.small, val)  

    def findMedian(self): 
        if len(self.small) > len(self.large): 
            return -1 * self.small[0] 
        elif len(self.small) < len(self.large): 
            return self.large[0] 
        else: 
            return (-1 * self.small[0] + self.large[0]) / 2 

For the last line, if I change:

-1 * self.small[0] + self.large[0]

into:

-1 * heapq.heappop(self.small) + heapq.heappop(self.large)

then the tests fail.

Why would that be any different?

Upvotes: 0

Views: 62

Answers (1)

trincot
trincot

Reputation: 350242

When you change -1 * self.small[0] + self.large[0] into -1 * heapq.heappop(self.small) + heapq.heappop(self.large) then it will still work the first time findMedian is called, but when it is called again, it will return (in general) a different result. The reason is that with heappop, you remove a value from the heap. This should not happen, as this changes the data that a next call of findMedian will have to deal with. findMedian is supposed to leave the data structure unchanged.

Note how the challenge says this:

double findMedian() returns the median of all elements so far.

I highlight the last two words. These indicate that findMedian is not (only) called when the whole stream of data has been processed, but will be called several times during the processing of the data stream. That makes it crucial that findMedian does not modify the data structure, and so heappop should not be used.

Upvotes: 2

Related Questions