user1623709
user1623709

Reputation: 89

Determining if an array has a k-majority element

Suppose that, given an n-element multiset A (not sorted), we want an O(n) time algorithm for determining whether A contains a majority element, i.e., an element that occurs more than n/2 times in A. It is easy to solve this in O(n) time by using the linear-time selection algorithm by finding the median (call it x), then counting how many times x occurs in A and returning it as the majority if the count exceeds n/2 (otherwise the answer is “there is no majority”). Now consider the following generalization of the problem: Given A and an integer k < n, we want an algorithm that determines whether A contains a value that occurs more than n/k times in it (if many such values exist, then it is enough to find one of them). Design an algorithm for doing this, and analyze its complexity as a function of n and k. Your grade on this question will depend on how fast your algorithm is (of course it also has to be correct). Partial credit of 10 points is given for an O(kn) time algorithm, full credit is for an O(n log k) time algorithm.

now I have come up with 2 solutions for the problem but neither fully satisfy the O(n log k) requirement. immediately i saw that i could sort the array using a O(n log n) algorithm then go through and see if any elements repeat more than n/k times linearly but that is O(n log n) not O(n log k)

I also have found and somewhat understood a O(nk) methood done by making an array of the same data type as the input and an int that is k long. then putting each element into an empty element incrementing its counter or if it matches one element in there incrementing its counter till we reach the k+1th unique element at which point you decrement all the counters by 1 till one reaches 0 at which point it is considered empty and the new element can be placed in it. and so on till the end of the input array. then checking all the elements left after we are done to see if they occur more than n/k times. but since this involves checking the n original elements against all k of the new arrays elements it is O(nk). any hints on how to do this problem in O(n log k)? I think the O(nk) algorithm is along the lines of how he wants us to think but I'm not sure where to go from here.

Upvotes: 9

Views: 4044

Answers (3)

Avi Cohen
Avi Cohen

Reputation: 3414

The method that you described just needs to be used recursively.

Remembering that select moves the elements that are less or equal to the median to the left of the median.

If A is of size n.

Find the median of A. Now find the median of each of the two sub multi-sets of length n/2 that were partitioned by the median. Find the median of each of the four sub multi-sets of length n/4 that were partitioned by the medians. Continue recursively until the leaves are of length n/k. Now the height of the recursive tree is O(lgk). On each level of the recursive tree, there are O(n) operations. If there exist a value that is repeated at least n/k times then it will be in one of these k with length of n/k sub multi-sets. The last operations is also done in O(n). So you get the requested running time of O(nlgk).

Upvotes: 7

Andrew Tomazos
Andrew Tomazos

Reputation: 68618

I don't know if you've seen this one, but it may help to give you ideas:

Suppose you know there is a majority element in an array L.

One way to find the element is as follows:

Def FindMajorityElement(L):

    Count = 0

    Foreach X in L

        If Count == 0
            Y = X

        If X == Y
            Count = Count + 1
        Else
            Count = Count - 1

    Return Y

O(n) time, O(1) space

Upvotes: 0

Peter de Rivaz
Peter de Rivaz

Reputation: 33509

O(kn) algorithm

I wonder if perhaps the O(kn) algorithm might be more along the lines of:

  1. Find k regularly spaced elements (using a similar linear select algorithm to the median)
  2. Count how many matches you get for each of these

With the idea being that if an element occurs n/k times, it must be one of these.

O(nlogk) algorithm

Perhaps you could use the scheme proposed in your question together with a tree structure to hold the k elements. This would then mean that the search for a match would only be log(k) instead of k, for an overall O(nlogk)?

Note that you should use the tree for both the first pass (where you are finding k candidates that we need to consider) and for the second pass of computing the exact counts for each element.

Also note that you would probably want to use a lazy evaluation scheme for decrementing the counters (i.e. mark whole subtrees that need to be decremented and propagate the decrements only when that path is next used).

O(n) algorithm

If you encounter this in real life, I would consider using a hash based dictionary to store the histogram as this should give a fast solution.

e.g. in Python you could solve this in (on average) O(n) time using

from collections import Counter
A=[4,2,7,4,6]
k=3

element,count = Counter(A).most_common()[0]

if count>=len(A)//k:
    print element
else:
    print "there is no majority"

Upvotes: 2

Related Questions