lateralus
lateralus

Reputation: 1030

Ordering a string by its substring numerical value in python

I have a list of strings that need to be sorted in numerical order using as a int key two substrings. Obviously using the sort() function orders my strings alphabetically so I get 1,10,2... that is obviously not what I'm looking for.

Searching around I found a key parameter can be passed to the sort() function, and using sort(key=int) should do the trick, but being my key a substring and not the whole string should lead to a cast error.

Supposing my strings are something like:

test1txtfgf10
test1txtfgg2
test2txffdt3
test2txtsdsd1

I want my list to be ordered in numeric order on the basis of the first integer and then on the second, so I would have:

test1txtfgg2
test1txtfgf10
test2txtsdsd1
test2txffdt3

I think I could extract the integer values, sort only them keeping track of what string they belong to and then ordering the strings, but I was wondering if there's a way to do this thing in a more efficient and elegant way.

Thanks in advance

Upvotes: 1

Views: 149

Answers (3)

vaultah
vaultah

Reputation: 46553

Try the following

In [26]: import re

In [27]: f = lambda x: [int(x) for x in re.findall(r'\d+', x)]

In [28]: sorted(strings, key=f)
Out[28]: ['test1txtfgg2', 'test1txtfgf10', 'test2txtsdsd1', 'test2txffdt3']

This uses regex (the re module) to find all integers in each string, then compares the resulting lists. For example, f('test1txtfgg2') returns [1, 2], which is then compared against other lists.

Upvotes: 4

Narcisse Doudieu Siewe
Narcisse Doudieu Siewe

Reputation: 649

import re
k = [
     "test1txtfgf10",
     "test1txtfgg2",
     "test2txffdt3",
     "test2txtsdsd1"
    ]

tmp = [([e for e in re.split("[a-z]",el) if e], el) for el in k ]
sorted(tmp, key=lambda k: tmp[0])
tmp = [res for cm, res in tmp]

Upvotes: 0

user1907906
user1907906

Reputation:

Extract the numeric parts and sort using them

import re

d = """test1txtfgf10
test1txtfgg2
test2txffdt3
test2txtsdsd1"""

lines = d.split("\n")

re_numeric = re.compile("^[^\d]+(\d+)[^\d]+(\d+)$")

def key(line):
    """Returns a tuple (n1, n2) of the numeric parts of line."""
    m = re_numeric.match(line)
    if m:
        return (int(m.groups(1)), int(m.groups(2)))
    else:
        return None

lines.sort(key=key)

Now lines are

['test1txtfgg2', 'test1txtfgf10', 'test2txtsdsd1', 'test2txffdt3']

Upvotes: 0

Related Questions