eager_coder
eager_coder

Reputation: 47

How do I find the 10 largest subarrays efficiently?

I need to find 10 largest subarrays (maximum length) in an array with the condition arr[high] - arr[low] < delta. It's taking 50 seconds right now (using Python). I'm able to find the maximum sub-array by modifying the algorithm to find maximum subarray with sum < somevalue. Right now, I'm just using a for loop and deleting the maximum sub-array found after every iteration. I tried a lot of things but am back to this now as nothing worked correctly. The array is sorted.

with open(in_file) as f_in, open(out_file, 'w') as f_out:
    dct = {}        
    mainlst = []
    # Read a file and store values in mainlst and map to some strings using dct

    for i in range(10): 
        start = 0
        end = 0
        maxim = 0
        diff = 0
        current = 1
        max_start = 0
        max_end = 0
        while end < len(mainlst)-1:
            end += 1
            diff = mainlst[end] - mainlst[start]                
            current += 1
            while diff > delta:
                start += 1
                diff = mainlst[end] - mainlst[start]
                current -= 1
            if maxim < current:
                maxim = current
                max_start = start
                max_end = end

        print("".join([dct[mainlst[max_start]], ",", str(maxim)]), file=f_out)

        del mainlst[max_start:max_end+1]

Edit: I forgot to mention another condition. The sub-arrays cannot overlap.

Upvotes: 0

Views: 162

Answers (1)

shole
shole

Reputation: 4094

There is a O(N lg N) algorithm:

  1. Iterate through each element, from small to large, set current element to A[low], O(N)
  2. Binary search the index of A[high] which fulfill the inequality, O(lg N)
  3. Push the length and the pair of (low, high) in a priority queue or any data structure which maintained order in O(lg N) times
  4. Pop the top 10, or top N items and that's the answer

EDITED

Thanks to @m69, using two pointers a better O(N) can be achieved:

  1. Iterate through each element, from small to large, set two pointers low and high pointing to the first element initially
  2. Move the high pointer to the right until A[high] - A[low] >= delta, push the length and the pair of (low, high) in a priority queue or any data structure which maintained order in O(lg N) times.

    For your special case, you can simply use an array of size 10 to store the longest 10 subarray, then you can use O(1) to maintain this array.

  3. Move low pointer to right , repeat step 2.

Note that low is always smaller than or equal to high, and both pointers always move to right only, each iterate through the list once. So it is O(N), or it is O(N lg N) for general case using priority queue.

Upvotes: 2

Related Questions