Realist
Realist

Reputation: 509

Given an unsorted python list, how can I find the minimum set of movements required to sort it

I have a list of items stored in a remote database which may be unsorted, and I want to sort them. The database accepts commands of them form:

move item1 before item2
move item3 after item2

So, given a list of the form:

[1,3,2,7,6,0,4]

...how can I get the sequence of moves:

move 2 before 3
move 7 after 6
move 0 before 1
move 4 before 6

I assume a modification of the bubblesort algorithm would work, but I'm specifically looking for the most efficient implementation that is still pythonic, and that generates the fewest move commands.

UPDATE: the list is 1000-10000 long, and all items are unique - no repeats. Only a very small number of items - 1-10 - will be in the wrong place at any given time. Time is a concern - it should take seconds, not minutes - but it does not have to be extremely fast.

UPDATE 2: I would also like to move each item only once

Upvotes: 7

Views: 1599

Answers (3)

Abhijit
Abhijit

Reputation: 63737

As you want to reduce the number of move sequences, the optimal approach I can think of is to use binary search on a sorted list to determine the insertion point of each element. If any of the element is already in its correct position, you need not move it.

This will generate n - d sequence moves where n is the number of elements and d is the number of elements in its correct position.

  • For an already sorted list, number of sequence moves are n - d = n - n = 0
  • For a list where all the elements are in wrong position, number of sequence moves are n - d = n - 0 = n

Implementation

def gen_move(seq):
    from bisect import bisect_left
    out = seq[0:1]
    for elem in seq[1:]:
        index = bisect_left(out, elem)
        if seq[index] != elem:
            if index == 0:
                print "Move {} before {}".format(elem, out[index])
            else:
                print "Move {} after {}".format(elem, out[index - 1])
        out.insert(index, elem)
    print out

Demo

gen_move([1,3,2,7,6,0,4])
Move 2 after 1
Move 6 after 3
Move 0 before 1
Move 4 after 3
[0, 1, 2, 3, 4, 6, 7]

gen_move(range(10)[::-1])
Move 8 before 9
Move 7 before 8
Move 6 before 7
Move 5 before 6
Move 4 before 5
Move 3 before 4
Move 2 before 3
Move 1 before 2
Move 0 before 1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

gen_move(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Performance

In [5]: %timeit gen_move(range(10000, 0, -1))
10000 loops, best of 3: 84 us per loop

Time Complexity

sum(1 ln 1 + 2 ln 2 + 3 ln 3 + ..... n ln n) < O(n ln n)

Space Complexity

O(n)

Upvotes: 3

Messa
Messa

Reputation: 25191

  1. Get data from remote database
  2. Sort them (just simple sort)
  3. Use difflib SequenceMatcher.get_opcodes to get replace/delete/insert/skip operations that transform original list to sorted list
  4. Transform these operations to "move X after/before Y" operations

I am a little bit worried about time complexity of difflib, so you should benchmark it for the expected data size. Faster alternative could be rolling-checksum algorithm (like librsync).

Upvotes: 2

georg
georg

Reputation: 214969

This is pretty naive, but seems to work:

xs = [1,3,2,7,6,0,4]
ys = []

for x in xs:
    for n, y in enumerate(ys):
        if x < y:
            print x, 'before', y
            ys.insert(n, x)
            break
    else:
        ys.append(x)

Result:

2 before 3
6 before 7
0 before 1
4 before 6

Upvotes: 1

Related Questions