developer.cyrus
developer.cyrus

Reputation: 903

Sort a "sorted" array

  1. Suppose given an array of size n, with sorted values.
  2. In iteration i, a new random-generated value is given, and inserted into the end of the array.
  3. The array is then resorted, and discard the least value item.
  4. After iteration n, the retained array will contain the largest value items.

For example, in Java syntax, it will be something like:

List l = new ArrayList();
l.add(new Integer(2));
l.add(new Integer(3));
l.add(new Integer(6));
l.add(new Integer(9));

Random rand = new Random();
for (int i=0; i < n; i++) {
  l.add(new Integer(rand.nextInt(1000)));
}    
Collections.sort(l);
l.remove(0);

But it seems it's inefficient. Any better algorithm?

Upvotes: 2

Views: 1875

Answers (16)

tanascius
tanascius

Reputation: 53964

Use a binary insert (works like a binary search) for the new value. Discard the smallest. Should be quite fast.

By the way - this can be implemented as a handy extension method:

private static int GetSortedIndex( this IList list, IComparer comparer, object item, int startIndex, int endIndex )
{
  if( startIndex > endIndex )
  {
    return startIndex;
  }
  var midIndex = startIndex + ( endIndex - startIndex ) / 2;
  return comparer.Compare( list[midIndex], item ) < 0 ?
    GetSortedIndex( list, comparer, item, midIndex + 1, endIndex ) :
    GetSortedIndex( list, comparer, item, startIndex, midIndex - 1 );
}

public static void InsertSorted( this IList list, IComparer comparer, object item )
{
  list.Insert( list.GetSortedIndex( comparer, item ), item );
}

Java Equivalent

public static void main(String[] args)
{
   List l = new ArrayList();
   l.add(new Integer(2));
   l.add(new Integer(3));
   l.add(new Integer(6));
   l.add(new Integer(9));

   Random rand = new Random();
   for (int i=0; i < 10; i++) {
       Integer rnd = new Integer(rand.nextInt(1000));
       int pos = Collections.binarySearch(l,rnd);
       if(pos < 0) pos = ~pos;
       l.add(pos,rnd);
   }    
   System.out.println(l);
}

Upvotes: 13

Peter Lawrey
Peter Lawrey

Reputation: 533870

Here is another solution which merges the operations into just a search, an array copy and a value set. This avoid the need for sorting or loops.

public static <T extends Comparable<T>> 
        void insertAndRemoveSmallest(T[] array, T t) {
    int pos = Arrays.binarySearch(array, t);
    if (pos < 0) pos = ~pos;
    // this is the smallest entry so no need to add it and remove it.
    if (pos == 0) return;
    pos--;
    // move all the entries down one.
    if (pos > 0) System.arraycopy(array, 1, array, 0, pos);
    array[pos] = t;
}

This program

public static void main(String... args) {
    Integer[] ints = {2, 3, 7, 6, 9};
    System.out.println("Starting with " + Arrays.toString(ints));
    for (int i : new int[]{5, 1, 10, 8, 8}) {
        insertAndRemoveSmallest(ints, i);
        System.out.println("After adding " + i + ": " + Arrays.toString(ints));
    }
}

prints

Starting with [2, 3, 7, 6, 9]
After adding 5: [3, 5, 7, 6, 9]
After adding 1: [3, 5, 7, 6, 9]
After adding 10: [5, 7, 6, 9, 10]
After adding 8: [7, 6, 8, 9, 10]
After adding 8: [6, 8, 8, 9, 10]

Upvotes: 0

Gareth Davis
Gareth Davis

Reputation: 28069

Use a TreeSet instead of a List, it'll maintain the order such that such that the largest value will always be at SortedSet#last(). If using 1.6+ you can use NavigableSet methods; pollLast() will return and remove the highest value.

NavigableSet<Integer> set = new TreeSet<Integer>();

//... setup data

Integer highest = set.pollLast();

set.add(rand.nextInt(1000));

Integer newHighest = set.pollLast();

Upvotes: 8

Jay
Jay

Reputation: 27492

A key question is whether you need to know the top 4 items AFTER EVERY NEW ITEM IS GENERATED, or if you only need the top 4 after all the items are generated. Also, is it literally 4 top items, or is that just an example or illustration?

Because if you really are generating thousands of values and only want the top 4, I'd think that comparing each new value to each of the existing 4 and discarding if less than all of them would be a lot faster than doing many sorts. That's just 4 compares for each new item, rather than the potentially much larger number to do repeated sorts.

Similarly if you only need the top N at the end of the process, it may well be faster to collect them all, sort, and then take the top N. But again, if most of the values are being eliminated, sorting the relative positions of the "losers" could be a big waste of time. If we only want the top 4, then whether an item is #5 or #10,382,842 is irrelevant.

Upvotes: 0

Bojan Resnik
Bojan Resnik

Reputation: 7388

The fastest algorithm I can think of would be to replace the smallest element with the new one, if needed, and push the new one to its proper place by repeatedly swapping with adjacent elements.

EDIT: The code assumes that the array is sorted in descending order, and thus the last element is the smallest.

void Insert(int[] array, int newValue)
{
    // If the new value is less than the current smallest, it should be
    // discarded
    if (new_value <= array[array.length-1])
        return;

    array[array.length-1] = newValue;
    for (int i = array.length-1; i > 0; --i)
    {
        if (newValue <= array[i-1])
            break;

        // Swap array[i] with array[i-1]
        array[i] = array[i-1];
        array[i-1] = newValue;
    }
}

Upvotes: 2

Bert F
Bert F

Reputation: 87603

Collections.binarySearch()

ArrayList.ensureCapcity()

Your pseudocode inserts a set of new items N into sorted list A of size S and then discards the smallest item. Use Collections.binarySearch() to find the insertion point. [Read the note the performance impact if your List is does not support RandomAccess. ArrayList does support RandomAccess.]

List<Integer> l = new ArrayList<Integer>();
l.add(new Integer(2));
l.add(new Integer(3));
l.add(new Integer(6));
l.add(new Integer(9));

l.ensureCapacity(l.size()+n);

Random rand = new Random();
for (int i=0; i < n; i++) {
  final Integer newInt = Integer.rand.nextInt(1000);
  int insertPoint = Collections.binarySearch(l, newInt);
  if (insertPoint < 0)  insertPoint = -(insertPoint + 1);
  l.add(insertPoint, newInt);
}
l.remove(0);

But, are you sure you wanted to discard just 1 item? Or did you mean to insert a set of new items N into sorted list A of size S and keep just the S largest items. In that case, keep track of the min value:

int min = l.get(0);
l.ensureCapacity(l.size()+n);

Random rand = new Random();
for (int i=0; i < n; i++) {
  final Integer newInt = Integer.rand.nextInt(1000);
  if (newInt > min) {
    int insertPoint = Collections.binarySearch(l, newInt);
    if (insertPoint < 0)  insertPoint = -(insertPoint + 1);
    l.add(insertPoint, newInt);
  }
}

However, if N is large, you may be better off sorting N into an sorted array by itself, discarding the smaller of N(0) or A(0), and then merging the two sorted arrays together [left as an exercise for the reader].

If you end up using an actual array, see Arrays.binarySearch and System.arraycopy.

Upvotes: 2

Noldorin
Noldorin

Reputation: 147461

I'm quite surprised no-one has mentioned this yet... The data structure you are looking for is a priority queue. It is without doubt the most efficient way of accomplishing this task. A priority queue can be implemented using a number of different methods (see the linked article), but the most common is based on a binary heap. In the self-binary variety (which is quite typical), insertion and deletion both take O(log n) time.

There seems to be a built-in generic class in the Java library, PriorityQueue<E>, so it would seem you can use this directly. This type does not surprisingly seemed to be based on a heap data structure, though more specific than that I cannot say. It should be very appropiate for your use, in any case.

Upvotes: 5

meade
meade

Reputation: 23331

I'm not sure if the above example would work, what is n? and if you cycle through adding random #'s from 1 to 1,000 you'll always end up with 1000, 999, 998 and 997 - no? I don't think adding a # and then resorting each time is efficient- it would probably be quicker to check each of the four positions and replacing with a higher #.

A lot depends on how many random #'s you'll add, to few # adds and check each of the 4 positions a lot of # adds just assume you get the highest in the range.

Upvotes: 0

Jason Watkins
Jason Watkins

Reputation: 2702

Do you really need an online one item at a time algorithm? Or are you actually parsing a larger collection of data and just want the top n items? If it's the latter, look at partial qsort.

Upvotes: 0

Dario
Dario

Reputation: 49228

ShellSort and Natural Mergesort are very performant (< O(n logn)) on largely pre-sorted data. Inserting into a sorted list with binary search needs much more time since one update requires O(n) anyway.

Alternatively, you could use heap-datastructures.

Upvotes: 0

cletus
cletus

Reputation: 625387

This will keep the size at 4 and do what you want as I understand it.

SortedSet<Integer> set = new TreeSet<Integer>();
set.add(2);
set.add(3);
set.add(6);
set.add(9);
Random rand = new Random();
for (int i=0; i < n; i++) {
  int i = rand.nextInt(1000);
  set.remove(set.first());
  set.add(i);
}    

Upvotes: 1

Thorarin
Thorarin

Reputation: 48516

A very simple optimalization would be to compare the lowest value in the sorted array (should thus be the first item) with the new value before inserting it. If the new value is greater than this value, replace the element with the new value and then resort the array.

Upvotes: 3

Svante
Svante

Reputation: 51541

I don't know whether you can change the data structure, or what other operations you need to support, but a heap would be a better fit for the kind of operations you describe.

Upvotes: 1

David Johnstone
David Johnstone

Reputation: 24470

If you're working with an ArrayList, you can replace the last number in the array with the new number if the new number is larger before you sort the array.

The Java Collections.sort uses merge sort which isn't the most efficient way of sorting in this situation. You want to use a binary search to find the insertion point and then shift all the subsequent numbers along by one.

EDIT: This can all be done with just an array like so:

public static int addDiscard(int[] list, int number)
{
    if (number > list[list.length - 1])
    {
        int index = findInsertionIndex(list, number); // use binary search
        for (int i = list.length - 1; i > index; i--)
        {
            list[i] = list[i - 1];
        }
        list[index] = number;
    }
}

Upvotes: 1

S.P
S.P

Reputation: 8766

Use a min-heap to store the data, and after each insertion of a new random value, delete the min in O(1) time.

After n iterations, perform n extract-min's to get the sorted list.

Upvotes: 5

Edouard A.
Edouard A.

Reputation: 6128

You can use binary search to insert a value into a sorted array.

Upvotes: 1

Related Questions