Ace Savage
Ace Savage

Reputation: 45

Python disregarding zeros when sorting numbers in a list

I'm trying to sort a list of dollar amounts from lowest to highest using python's built in sort ability, but when I call on it, it sorts the numbers super screwy. It starts at $10,000 then goes up to $19,0000 (which is the highest) then jumps down to $2,000 and counts up from there ostensibly because 2 is bigger than 1. I don't know how to correct for this. The code I've used is below.

numbers=[['$10014.710000000001'], ['$10014.83'],['$11853.300000000001'],
['$19060.010000000006'],['$2159.1099999999997'],['$3411.1400000000003']]

print(sorted(numbers))

Upvotes: 2

Views: 1142

Answers (4)

moo
moo

Reputation: 2175

I needed to achieve a slightly simpler variant of this problem, now posting in case of use to others.

I had a directory full of files:

filenames = [
    '1.dcm', '10.dcm', '11.dcm',
    '12.dcm', '13.dcm', '14.dcm',
    '15.dcm', '16.dcm', '17.dcm',
    '18.dcm', '19.dcm', '2.dcm',
    '3.dcm', '4.dcm', '5.dcm',
    '6.dcm', '7.dcm', '8.dcm',
    '9.dcm'
]

This output from os.listdir() is not uncommon but I wanted them sorted in numerical order without needing the leading 0s. In Linux, you might achieve this with ls | sort -h

In Python, you can sort files named without leading zeros without relying on external libraries using lambda functions in a single line by removing additional text and casting to an int:

ordered_filenames = sorted(filenames, key=lambda x: int(x.replace('.dcm', ''))

This could be adjusted for the dollar problem:

ordered_dollar_amounts = sorted(
    dollar_amounts,
    key=lambda x: float(x.replace('$', '')
)

Upvotes: 0

Daniel Pryden
Daniel Pryden

Reputation: 60957

The key insight here is that the values in your list are actually strings, and strings are compared lexically: each character in the string is compared one at a time until the first non-matching character. So "aa" sorts before "ab", but that also means that "a1000" sorts before "a2". If you want to sort in a different way, you need to tell the sort method (or the sorted function) what it is you want to sort by.

In this case, you probably should use the decimal module. And you want the key attribute of the sort method. This will sort the existing list you have, only using the converted values during the sorting process.

import decimal

def extract_sortable_value(value):
    # value is a list, so take the first element
    first_value = value[0]
    return decimal.Decimal(first_value.lstrip('$'))

numbers.sort(key=extract_sortable_value)

Equivalently, you could do:

print(sorted(numbers, key=extract_sortable_value))

Demo: https://repl.it/repls/MiserableDarkPatches

Upvotes: 4

finefoot
finefoot

Reputation: 11242

Your numbers are currency values. So as pointed out in the comments below, it might make sense to use Python's decimal module which offers several advantages over the float datatype. (See link for further information.)


If, however, this is only an exercise for better getting to know Python, as I suspect. You might look for a simpler solution:

The reason, why your sorting doesn't work, is because your numbers are stored in the list inside another list as a string. You have to convert them to integers or floats before sorting has the effect you're looking for:

numbers=[
    ['$10014.710000000001'],
    ['$10014.83'],
    ['$11853.300000000001'],
    ['$19060.010000000006'],
    ['$2159.1099999999997'],
    ['$3411.1400000000003']
]

numbers_float = [float(number[0][1:]) for number in numbers]
numbers_float.sort()

print(numbers_float)

Which prints:

[2159.1099999999997, 3411.1400000000003, 10014.710000000001, 10014.83, 11853.300000000001, 19060.010000000006]

When you look at float(number[0][1:]), then [0] takes the first (and only) number of your (inner) number list, [1:] strips the $ sign and finally float does the conversion to floating point number.

If you want the $ sign back:

for number in numbers_float:
    print("${}".format(number))

Which prints:

$2159.1099999999997
$3411.1400000000003
$10014.710000000001
$10014.83
$11853.300000000001
$19060.010000000006

Upvotes: 2

user8408080
user8408080

Reputation: 2468

You are not sorting numbers but strings, which explains the "weird" result. Instead, change your type to float and sort the resulting list:

In [3]: sorted([[float(el[0][1:])] for el in numbers])
Out[3]: 
[[2159.1099999999997],
 [3411.1400000000003],
 [10014.710000000001],
 [10014.83],
 [11853.300000000001],
 [19060.010000000006]]

I need the el[0] because every number is inside its own list, which is not a good style, but I guess you have your reasons for this. The [1:] strips away the $ sign.

EDIT really good point made in the comments. More robust solution:

from decimal import Decimal

import decimal

decimal.getcontext().prec = 4

sorted([Decimal(el[0][1:]) for el in numbers])
Out[8]: 
[Decimal('2159.1099999999997'),
 Decimal('3411.1400000000003'),
 Decimal('10014.710000000001'),
 Decimal('10014.83'),
 Decimal('11853.300000000001'),
 Decimal('19060.010000000006')]

Upvotes: 2

Related Questions