Jomonsugi
Jomonsugi

Reputation: 1279

sort a list containing strings with digits at beginning and end of string

I need to sort a list of strings which contains digits at the beginning and end of the string, first by the beginning digits, then by the ending digits. So the beginning digits have priority over the ending digits.

For example:

    l = ['900abc5', '3000abc10', '1000abc5', '1000abc10', '900abc20']

Would become:

    l = ['900abc5', '900abc20','1000abc5','1000abc10','3000abc10']

I know that l.sort() will not work here as it sorts lexicographically. Any other methods I tried seemed to be excessively complicated (example: splitting the strings by matching beginning digits, then splitting again by ending digits, sorting, concatenating, and then recombining the list) Even summarizing that method shows that it is not efficient!

Edit: after playing around with the natsort module I found that natsorted(l) solves my particular issue.

Upvotes: 0

Views: 1263

Answers (3)

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48047

You may create a custom function to extract the numbers from string and use that function as a key to sorted().

For example: In the below function, I am using regex to extract the number:

import re

def get_nums(my_str):
    return list(map(int, re.findall(r'\d+', my_str)))

Refer Python: Extract numbers from a string for more alternatives.

Then make a call to sorted function using get_nums() as key:

>>> l = ['900abc5', '3000abc10', '1000abc5', '1000abc10', '900abc20']

>>> sorted(l, key=get_nums)
['900abc5', '900abc20', '1000abc5', '1000abc10', '3000abc10']

Note: Based on your example, my regex expression assume that there will be a number only at the start and the end of the string with all intermediate characters in strings as non-numeric.

Upvotes: 4

akuiper
akuiper

Reputation: 214927

Here is an option with regex to findout the leading digits and trailing digits and use them as keys in the sorted function:

import re
sorted(l, key = lambda x: (int(re.findall("^\d+", x)[0]), int(re.findall("\d+$", x)[0])))

# ['900abc5', '900abc20', '1000abc5', '1000abc10', '3000abc10']

Upvotes: 1

phss
phss

Reputation: 1022

Python's sorted method allows the specification of a key parameter, which should be a function that transform a list's element into a sorting value. In your case, you want to sort by the digits in the string. For example '900abc5', the key would be [900, 5], and so on. So you want to pass in a key function that transform the string into the list of digits.

Using regular expressions, it's quite easy to extract the digits from the string. All you need to do is to map the digits into actual numbers, as regular expressions return string matches.

I believe the code below should work:

import re

l = ['900abc5', '3000abc10', '1000abc5', '1000abc10', '900abc20']

def by_digits(e):
  digits_as_string = re.findall(r"\d+", e)
  return map(int, digits_as_string)

sorted(l, key=by_digits)

Upvotes: 0

Related Questions