Learner
Learner

Reputation: 21393

Find pairs with least difference

I have an array of numbers 3,1,3,5 and a number k with the value 3. I want to find pairs whose absolute adjacent difference between them is the smallest. Find k such pairs.

All possible pairs are :

|3 - 1| = 2
|3 - 3| = 0
|3 - 5| = 2
|1 - 3| = 2
|1 - 5| = 4
|3 - 5| = 2

For smallest k = 3 values are [0,2,2] is the required answer

My approach:

public static List<Integer> process(List<Integer> list, int k) {
    int n = list.size();
    List<Integer> all = new ArrayList<>();
    Collections.sort(list);
    for(int i=0; i<n; i++) {
        for(int j=i+1; j<n; j++) {
            all.add(list.get(j) - list.get(i));
        }
    }
    Collections.sort(all);
    List<Integer> answer = new ArrayList<>();
    for(int i=0; i<k; i++) {
        answer.add(all.get(i));
    }
    
    return answer;
}

Here I am trying to get all possible pairs and then get the smallest values. so time complexity is more for this program. How to reduce time complexity for this task.

Upvotes: 1

Views: 384

Answers (3)

wLui155
wLui155

Reputation: 742

Here's an outline of another approach, which avoids the use of a priority queue.

Reformulating the question, we'd like to find the smallest k differences in an array A, and write it to an output array R. Let count_pairs(arr, d) be a placeholder function that tells us how many pairs in an array arr have a difference less than/equal to d. Let N be the length of A.

  1. Sort A. Let B be the sorted array.
  2. Compute difference d > 0 where count_pairs(B, d) < k and count_pairs(B, d + 1) >= k. Note count_pairs(B, 0) >= k indicates you could just let R be an array of k zeroes. d can be found via binary search from 0 to the difference of the largest and smallest elements in A.
  3. After finding d, pick out the pairs whose difference is less than/equal to d. Write their differences to R.
  4. Fill up the remaining slots in R with d + 1; there should be exactly k - count_pairs(B, d) remaining slots.

Observations

  • count_pairs() can be implemented in O(N * log(N)) time via binary search if the input array is sorted. Don't think this can be improved by much.
  • If you implement 3. using binary search (similar to what's done for count_pairs()) the time complexity should be O(N * log(N) + count_pairs(B, d)).
  • If R needs to be in sorted order, you'll have to sort R before returning it, since 3. won't guarantee this - only that all values are smaller than/equal to d.

So, the overall time complexity of each step is as follows:

  1. Sorting - O(N * log(N))
  2. Searching for d - O(log(max(A) - min(A)) * N * log(N))
  3. Fill in the smallest values - O(N * log(N) + count_pairs(B, d))
  4. Fill in the remaining values of d + 1 - O(k - count_pairs(B, d))

I suspect that 2. will likely be the bottleneck and would be curious to see how well this approach compares to the priority queue one on actual data.

Upvotes: 0

btilly
btilly

Reputation: 46389

Here is a Python solution. It is somewhat similar to what was already suggested, except that putting values back into the heap allows non-adjacent differences to also be discovered.

Translation to Java left as an exercise for the reader.

import heapq
def closest_differences (elements, limit=None):
    e = sorted(elements)
    best = []
    for i in range(len(elements)-1):
        best.append((e[i+1] - e[i], i, i+1))

    heapq.heapify(best)
    yielded = 0
    while 0 < len(best):
        value, i, j = heapq.heappop(best)
        yield value
        yielded += 1
        if limit is not None and limit <= yielded:
            break
        if j+1 < len(e):
            heapq.heappush(best, (e[j+1] - e[i], i, j+1))


for d in closest_differences([3, 1, 3, 5], 3):
    print(d)

Upvotes: 2

Abhinav Mathur
Abhinav Mathur

Reputation: 8101

  1. Create a max heap/priority queue of size k.
  2. Use a nested loop to iterate over all possible differences.
  3. For each difference between consecutive elements, push the difference to the heap if heap.max > difference

O(n^2) for iteration, O(n^2 logk) for heap.
Total time complexity: O(n^2 logk)
Total space complexity: O(k) (for heap)

In your approach, you've sorted the list of all n^2 differences, so that has a complexity of O(n^2 logn).

Upvotes: 2

Related Questions