BreadCrumb
BreadCrumb

Reputation: 91

Algorithm to find k smallest numbers in an array in same order using O(1) auxiliary space

For example if the array is arr[] = {4, 2, 6, 1, 5}, and k = 3, then the output should be 4 2 1.

Upvotes: 6

Views: 1808

Answers (5)

Danny_ds
Danny_ds

Reputation: 11406

Solution with O(1) space and large k (for example 100,000) with only a few passes through the list.

In my first answer I presented a single pass solution using O(k) space with an option for single pass O(1) space if we are allowed to overwrite the data.

For data that cannot be overwritten, ciamej provided a O(1) solution requiring up to k passes through the data, which works great.

However, for large lists (n) and large k we may want a faster solution. For example, with n=100,000,000 (distinct values) and k=100,000 we would have to check 10 trillion items with a branch on each item + an extra pass to get those items.

To reduce the passes over n we can create a small histogram of ranges. This requires a small storage space for the histogram, but since O(1) means constant space (i.e. not depending on n or k) I think we're allowed to do that. That space could be as small as an array of 2 * uint32. Histogram size should be a power of two, which allows us to use bit masking.

To keep the following example small and simple, we'll use a list containing 16-bit positive integers and a histogram of uint32[256] - but it will work with uint32[2] as well.

First, find the k-th smallest number - only 2 passes required:

uint32 hist[256];

First pass: group (count) by multiples of 256 - no branching besides the loop
    loop:
        hist[arr[i] & 0xff00 >> 8]++;

Now we have a count for each range and can calculate which bucket our k is in.
Save the total count up to that bucket and reset the histogram.

Second pass: fill the histogram again,
    now masking the lower 8 bits and only for the numbers belonging in that range.
    The range check can also be done with a mask

After this last pass, all values represented in the histogram are unique
    and we can easily calculate where our k-th number is.

If the count in that slot (which represents our max value after restoring
    with the previous mask) is higher than one, we'll have to remember that
    when printing out the numbers.
    This is explained in ciamej's post, so I won't repeat it here.

---

With hist[4] and a list of 32-bit integers we would need 8 passes.

The algorithm can easily be adjusted for signed integers.

Example:

k = 7

uint32_t hist[256];  // can be as small as hist[2]

uint16_t arr[]:

88
258
4
524
620
45
440
112
380
580
88
178

Fill histogram with:
    hist[arr[i] & 0xff00 >> 8]++;

hist         count
0 (0-255)      6
1 (256-511)    3 -> k
2 (512-767)    3
...

k is in hist[1] -> (256-511)

Clear histogram and fill with range (256-511):

Fill histogram with:
    if (arr[i] & 0xff00 == 0x0100)
        hist[arr[i] & 0xff]++;

Numbers in this range are:

258 & 0xff =   2
440 & 0xff = 184
380 & 0xff = 124

hist         count
0              0
1              0
2              1 -> k
...            0
124            1
...            0
184            1
...            0

k - 6 (first pass) = 1
k is in hist[2], which is 2 + 256 = 258

Loop through arr[] to display the numbers <= 258 in preserved order.

Take care of possible duplicate highest numbers (hist[2] > 1 in this case).
    we can easily calculate how many we have to print of those.

Further optimization:

If we can expect k to be in the lower ranges, we can even optimize this further by using the log2 values instead of fixed ranges:

There is a single CPU instruction to count the leading zero bits (or one bits)
    so we don't have to call a standard log() function
    but can call an intrinsic function instead.

This would require hist[65] for a list with 64-bit (positive) integers.

We would then have something like:

        hist[ 64 - n_leading_zero_bits ]++;

This way the ranges we have to use in the following passes would be smaller.

Upvotes: 0

Danny_ds
Danny_ds

Reputation: 11406

Single pass solution - O(k) space (for O(1) space see below).

The order of the items is preserved (i.e. stable).

// Pseudo code

if ( arr.size <= k )
    handle special case

array results[k]
int i = 0;

// init
for ( ; i < k, i++) {   // or use memcpy()
    results[i] = arr[i]
}

int max_val = max of results

for( ; i < arr.size; i++) {

    if( arr[i] < max_val ) {
        remove largest in results    // move the remaining up / memmove()
        add arr[i] at end of results // i.e. results[k-1] = arr[i]
        max_val = new max of results
    }
}

// for larger k you'd want some optimization to get the new max
// and maybe keep track of the position of max_val in the results array

Example:

4 6 2 3 1 5

4 6 2   // init
4 2 3   // remove 6, add 3 at end
2 3 1   // remove 4, add 1 at end

// or the original:

4 2 6 1 5

4 2 6   // init
4 2 1   // remove 6, add 1 -- if max is last, just replace

Optimization:

If a few extra bytes are allowed, you can optimize for larger k:

create an array size k of objects {value, position_in_list}

keep the items sorted on value:
    new value: drop last element, insert the new at the right location
    new max is the last element

sort the end result on position_in_list

for really large k use binary search to locate the insertion point

O(1) space:

If we're allowed to overwrite the data, the same algorithm can be used, but instead of using a separate array[k], use the first k elements of the list (and you can skip the init).

If the data has to be preserved, see my second answer with good performance for large k and O(1) space.

Upvotes: 1

bit-shashank
bit-shashank

Reputation: 921

First find the Kth smallest number in the array.

Look at https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-2-expected-linear-time/

Above link shows how you can use randomize quick select ,to find the kth smallest element in an average complexity of O(n) time.

Once you have the Kth smallest element,loop through the array and print all those elements which are equal to or less than Kth smallest number.

    int small={Kth smallest number in the array}
    for(int i=0;i<array.length;i++){
         if(array[i]<=small){
            System.out.println(array[i]+ " ");  
       }

}

Upvotes: 0

ciamej
ciamej

Reputation: 7068

It can be done in O(nk) steps and O(1) space.

Firstly, find the kth smallest number in kn steps: find the minimum; store it in a local variable min; then find the second smallest number, i.e. the smallest number that is greater than min; store it in min; and so on... repeat the process from i = 1 to k (each time it's a linear search through the array).

Having this value, browse through the array and print all elements that are smaller or equal to min. This final step is linear.

Care has to be taken if there are duplicate values in the array. In such a case we have to increment i several times if duplicate min values are found in one pass. Additionally, besides min variable we have to have a count variable, which is reset to zero with each iteration of the main loop, and is incremented each time a duplicate min number is found.

In the final scan through the array, we print all values smaller than min, and up to count values exactly min.

The algorithm in C would like this:

int min = MIN_VALUE, local_min;
int count;
int i, j;

i = 0;
while (i < k) {
  local_min = MAX_VALUE;
  count = 0;
  for (j = 0; j < n; j++) {
    if ((arr[j] > min || min == MIN_VALUE) && arr[j] < local_min) {
      local_min = arr[j];
      count = 1;
    }
    else if ((arr[j] > min || min == MIN_VALUE) && arr[j] == local_min) {
      count++;
    }
  }
  min = local_min;
  i += count;
}

if (i > k) {
  count = count - (i - k);
}

for (i = 0, j = 0; i < n; i++) {
  if (arr[i] < min) {
    print arr[i];
  }
  else if (arr[i] == min && j < count) {
    print arr[i];
    j++;
  }
}

where MIN_VALUE and MAX_VALUE can be some arbitrary values such as -infinity and +infinity, or MIN_VALUE = arr[0] and MAX_VALUE is set to be maximal value in arr (the max can be found in an additional initial loop).

Upvotes: 3

xiawi
xiawi

Reputation: 1822

A baseline (complexity at most 3n-2 for k=3):

  • find the min M1 from the end of the list and its position P1 (store it in out[2])

  • redo it from P1 to find M2 at P2 (store it in out[1])

  • redo it from P2 to find M3 (store it in out[0])

It can undoubtedly be improved.

Upvotes: 0

Related Questions