Reputation: 1035

Order a list by all item's digits in Python

I want to sort a list by each item's digit.

Example:

myCmpItem = '511'
myList = ['111','222','333','444','555','123']

(some magic)

mySortedList = ['111', '222', '333', '123', '444', '555']

How the algorithm should work:

Compare each digit of current item in myList with myCmpItem
- For the first item in the list it would be like that:
- Difference between 5 and 1 is 4
- Difference between 1 and 1 is 0
- Difference between 1 and 1 is 0
- Difference between those two numbers is 4 (the sum of the digit comparison)
Do the same for all other items
Order the list by this calculated similarity

I could code this with alot of for-loops, but I am actually looking for a faster way to do this. Is there any algorithm that does something like that? Fast?

Further Limitations

In my example all items have a length of 3, in the real scenario they have a length of 25
All items have the same length, len(myList[x])==25 is always true
Items can be strings, ints, floats or whatever fits better to the algorithm
There are only digits between 1 and 5

Background

All item's digits are answers to questions and I want to find the most similar answer-set to a given answer-set. So "123" means that a user answered to Questions 1 = Answer 1, Question 2 = Answer 2, Question 3 = Answer 3. They are multiple choice questions with 25 questions in total (= length of 25) and there are always 5 different possibilites to answer (Those are the digits 1-5).

PS: This is the first question I asked on Stackoverflow so please be kind with me. I already googled for hours but I could not find any solution, so I asked here. I hope that is fine. Also english is not my native language.

The Answer (thanks to all participants!)

@larsmans' answer (https://stackoverflow.com/a/10790714/511484) explains very well how to solve this with reasonable speed. You can even speed up the algorithm by calculating the distances between every digit in advance, see @gnibbler's post (https://stackoverflow.com/a/10791838/511484) All the other answers were also nice and correct, but I found that @larsmans had the best explanation. Thanks everybody once again for the help!

Upvotes: 5

Answers (5)

John La Rooy

Reputation: 304137

Precomputing a table of distances may be faster than converting every digit to int

myCmpItem = '511'
myList = ['111','222','333','444','555','123']

# only need to compute this dict once
dists = {(i,j):abs(int(i)-int(j)) for i in '12345' for j in '12345'}

print sorted(myList, key=lambda j: sum(dists[i] for i in zip(j, myCmpItem)))

On my computer, this is 2.9 times faster than larsmans answer for 100000 x 25 character strings

Upvotes: 2

cval

Reputation: 6809

With lambda and list comprehension:

sorted(myList, key=lambda item: sum([abs(int(x) - int(y)) for x, y in zip(item, myCmpItem)])

Upvotes: 4

Fred Foo

Reputation: 363497

First, make a list of integers from myCmpItem to make subtraction possible.

myCmpItem = map(int, myCmpItem)

Then, define a function that calculates the distance between an item and myCmpItem. We need to map the items to lists of integers as well. The rest is just the vanilla formula for L1 distance (the mathematical name of the "difference" you're computing).

def dist(item):
    item = map(int, item)
    return sum(abs(item[i] - myCmpItem[i]) for i in xrange(len(item)))

Then, use this function as a key function for sorting.

sorted(myList, key=dist)

(PS: are you sure L1 distance makes sense for this application? Using it expresses the assumption that answer 1 is more similar to answer 2 than to answer 3, etc. If that's not the case, Hamming distance might be more appropriate.)

Upvotes: 7

Jochen Ritzel

Reputation: 107598

def cmpWith(num):
    def compare(item):
        """ calculate the difference between num and item """
        return sum(
            abs(int(n) - int(x)) # cast to int to make the substraction possible
            for x,n in zip(item, num) # zip makes pairs from both lists 
        )

    return compare

lst = ['111','222','333','444','555','123']
print sorted(lst, key=cmpWith('511'))

Upvotes: 4

Nick Craig-Wood

Reputation: 54079

How about this?

myCmpItem = '511'
myList = ['111','222','333','444','555','123']

def make_key(x):
    diff = 0
    for a, b in zip(x, myCmpItem):
        diff += abs(int(a)-int(b))
    return diff

mySortedList = sorted(myList, key=make_key)

print mySortedList