Rafael Santos
Rafael Santos

Reputation: 313

Sorting a Python list of numbers without knowing their values, but only their relationships among each other

I have a list of numbers which I can't really know their real values. Let's say list = [a,b,c,d] . But I have information about how they relate to each other:

a > b
b > d
d > c

Therefore I can infer that list sorted in descending order is [ a, b, d, c].

Another more complicated example

relationships =
 - g > b
 - d > c > a
 - d > g
 - f > e > a
 - f > e > d

In this case we have multiple possible answers

            Result: 
  [ [f, e, d, c, a, g, b],
    [f, e, d, g, c, a, b],
    [f, e, d, g, c, b, a],
  ... ]

just to name a few

I want the function to return me exactly that: a list of all possible answers. The relationships is a list of lists, where each list representes the relationship of the sorting in descending order. For example relationships = [[a,b],[b,c]] tells me that "a > b" and "b > c" . The inner lists inside relationships don't have to be necessarily of the same size. In case of an invalid/impossible input the algorithm should thrown an error. For example:

relationships = [[a,b],[b,c],[c,a] is an impossible case

What's the best approach for this that is also efficient ? I was thinking about using graphs theory where the graphs cannot be acyclic. For example A -> B means Node A goes to B meaning A > B , therefore B -> A cannot exist. Somehow like a binary tree, but in this case instead of allowing 2 child per node we can only allow 1.

Is that a good idea to start on this or does anyone have any better idea on how to approach this?

Upvotes: 1

Views: 398

Answers (2)

Paddy3118
Paddy3118

Reputation: 4772

I wrote Python code for Topological sort on site Rosetta Code here.

The input is a dict mapping nodes to the set of nodes it depends on, (in your case, nodes they are "greater than"). use the following to represent your example dependencies:

a,b,c,d,e,f,g = 'abcdefg'  # For ease of use
data = {c:a, d:'cag', e:'ad', f:'ead', g:b}  # <node>:<is_greater_than>
dat = {key: set(val) for key, val in data.items()}
print ('\n'.join( toposort2(dat) ))

The correct output is the following, where nodes on the same line can appear in any order before nodes from other lines above it:

a b
c g
d
e
f

Note your solution you give to your example is wrong in that you cannot have f,e,d then followed immediately by b; it must by c or g (in any order); then a or b (in any order).

Upvotes: 1

btilly
btilly

Reputation: 46399

You need 3 ideas to understand this.

First is topological sorting via Kahn's algorithm. See https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm for a description. I am walking through all of the way to generate a topological sort by that algorithm, and yielding each one.

The second is stack based programming. Think Forth, or RPN. The idea is that I have a stack of actions. At each step I take the top action off of the todo list and do it. If I add items onto the todo list, then that is like having recursive calls. In my case the actions are to choose (try all choices that make sense), add (add an element to the sorted list and do bookkeeping), remove (remove an element from the sorted list and do bookkeeping) and try (schedule the actions add/choose/remove - but placed in the opposite order because stack).

The third is generators. I just yield multiple times and then continue from where I was. This can be looped over, turned into a list, etc.

Here is the code.

def topological_sorts (descending_chains):
    arrive_to = {}
    outstanding_arrivals = {}
    for chain in descending_chains:
        for x in chain:
            if x not in arrive_to:
                arrive_to[x] = set([])
                outstanding_arrivals[x] = 0
        for i in range(1, len(chain)):
            arrive_to[chain[i-1]].add(chain[i])

    for item in arrive_to:
        for x in arrive_to[item]:
            outstanding_arrivals[x] += 1

    todo = [['choose', None]]
    sorted_items = []
    chosen = set([])
    items = [x for x in arrive_to]
    while len(todo):
        action, item = todo.pop()
        if action == 'choose':
            choices = []
            for x in outstanding_arrivals:
                if x not in chosen and 0 == outstanding_arrivals[x]:
                    choices.append(x)
            if 0 == len(choices):
                if len(sorted_items) == len(items):
                    yield sorted_items
                else:
                    print((choices, outstanding_arrivals, sorted_items))
                    break
            else:
                for item in choices:
                    todo.append(['try', item])
        elif action == 'try':
            chosen.add(item)
            sorted_items.append(item)
            todo.append(['remove', item])
            todo.append(['choose', None])
            todo.append(['add', item])
        elif action == 'add':
            chosen.add(item)
            for x in arrive_to[item]:
                outstanding_arrivals[x] -= 1
        elif action == 'remove':
            chosen.remove(item)
            sorted_items.pop()
            for x in arrive_to[item]:
                outstanding_arrivals[x] += 1
        else:
            yield ('todo:', action, item)

and here is how to use it for your second example.

for sort in topological_sorts([
        ['g', 'b'],
        ['d', 'c', 'a'],
        ['d', 'g'],
        ['f', 'e', 'a'],
        ['f', 'e', 'd']]):
    print(sort)

Upvotes: 2

Related Questions