rNOde
rNOde

Reputation: 59

Splitting and sorting a list based on substring

I'm trying to take a list from an array, and split the strings to sort the list sequentially by the last series of 6 numbers (for instance '042126). To do this I would split by '.', use the second to the last split of the string [-2], and then sort matchfiles[1] with this substring.

The files should end up sorted like: erl1.041905, erl1.041907, erl2.041908, erl1.041909, erl2.041910, etc.

Two questions: how do I specify unlimited number of splits per string (in case of longer names using additional '.'? I am using 4 splits, but this case may not hold. Else, how would I just split two times working backwards?

More importantly, I am returned an error: 'list' object is not callable. What am I doing wrong?

Thanks

matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png', 
                        'blue.2017-09-05t15-15-11.erl1.041907.png', 
                        'blue.2017-09-05t15-15-14.erl1.041909.png', 
                        'blue.2017-09-05t14-21-35.erl2.041908.png', 
                        'blue.2017-09-05t14-21-38.erl2.041910.png', 
                        'blue.2017-09-05t14-21-41.erl2.041912.png', 
                        'blue.2017-09-05t14-21-45.erl2.041914.png'], 
                        [09302] ]

matchtry = sorted(matchfiles[1], key = [i.split('.', 4)[-2] for i in 
matchfiles[1]])

Upvotes: 3

Views: 2572

Answers (4)

toornt
toornt

Reputation: 219

Yes, the issue is your key. You can use a lambda expression: https://en.wikipedia.org/wiki/Anonymous_function#Python

Imagine this as a mathematical map. The key being used to sort needs a function, so you define a lambda like:

lambda curr: curr.split('.')[-2]

This gives each current object in the list the name "curr" and applies the expression following the :. So in your case this should do the thing:

matchtry = sorted(matchfiles[1], key=lambda curr: curr.split('.')[-2])

Upvotes: 1

evamicur
evamicur

Reputation: 403

Remember that the key argument to sorted takes each element of your iterable (list in your case) and converts it to some value. The values of each element after being transformed by key determine the sort order. So a simple way to get this to work every time is to define a function that takes one element and converts it to something that's easy to sort:

def fname_to_value(fname):
    name, ext = os.path.splitext(fname) # remove extension 
    number = name.split('.')[-1]  # Get the last set of stuff after the last '.'
    return number  # no need to convert to int, string compare does what you want

So now you have a function converting the filename to a sortable value. Simple supply this to sorted as the key argument and you're done.

matchtry = sorted(matchfiles[1], key = fname_to_value)
for match in matchtry:
    print(match)

result:

blue.2017-09-05t15-15-07.erl1.041905.png
blue.2017-09-05t15-15-11.erl1.041907.png
blue.2017-09-05t14-21-35.erl2.041908.png
blue.2017-09-05t15-15-14.erl1.041909.png
blue.2017-09-05t14-21-38.erl2.041910.png
blue.2017-09-05t14-21-41.erl2.041912.png
blue.2017-09-05t14-21-45.erl2.041914.png

You can then process the resulting list as needed.

Upvotes: 1

David Scarlett
David Scarlett

Reputation: 3341

The key parameter to sorted requires a function. [i.split('.', 4)[-2] for i in matchfiles[1]] is a list, not a function. The expected function acts on a single element from the list, so you need a function that takes a string, splits it on the '.' character, and returns the second last column, possibly converted to an integer.

Also, Python does not allow integers to begin with a zero, so you must change that [09302] to [9302]. (Beginning with 0 signifies that the number will be non-decimal. In Python 2, 0427 would be 427 octal, but in Python 3, octal number must be preceded by 0o instead. 09302 is invalid in both versions, as an octal number cannot contain 9.)

matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
                        'blue.2017-09-05t15-15-11.erl1.041907.png',
                        'blue.2017-09-05t15-15-14.erl1.041909.png',
                        'blue.2017-09-05t14-21-35.erl2.041908.png',
                        'blue.2017-09-05t14-21-38.erl2.041910.png',
                        'blue.2017-09-05t14-21-41.erl2.041912.png',
                        'blue.2017-09-05t14-21-45.erl2.041914.png'],
                        [9302] ]

matchtry = sorted(matchfiles[1], key = lambda str: int(str.split('.')[-2]))

Upvotes: 1

DeepSpace
DeepSpace

Reputation: 81604

  • The keyargument expects a function, but you give it a list, hence the error list is not callable.

  • You should use split('.')[-2] which always takes the second to last element.


matchfiles = [ [1723], ['blue.2017-09-05t15-15-07.erl1.041905.png',
                        'blue.2017-09-05t15-15-11.erl1.041907.png',
                        'blue.2017-09-05t15-15-14.erl1.041909.png',
                        'blue.2017-09-05t14-21-35.erl2.041908.png',
                        'blue.2017-09-05t14-21-38.erl2.041910.png',
                        'blue.2017-09-05t14-21-41.erl2.041912.png',
                        'blue.2017-09-05t14-21-45.erl2.041914.png'],
                        [9302] ]

matchtry = sorted(matchfiles[1], key=lambda x: x.rsplit('.')[-2])
print(matchtry)
# ['blue.2017-09-05t15-15-07.erl1.041905.png', 'blue.2017-09-05t15-15-11.erl1.041907.png', 
   'blue.2017-09-05t14-21-35.erl2.041908.png', 'blue.2017-09-05t15-15-14.erl1.041909.png',
   'blue.2017-09-05t14-21-38.erl2.041910.png', 'blue.2017-09-05t14-21-41.erl2.041912.png',
   'blue.2017-09-05t14-21-45.erl2.041914.png']

Upvotes: 4

Related Questions