Reputation: 31
I have a list (of dictionary keys), which I need to sort. This is my list:
listToBeSorted = ["Right Coronary Artery 2", "Right Coronary Artery 1", "RIght Coronary Artery 3"]
Obviously, the order in which I'd like to have these items sorted would be:
["Right Coronary Artery 1", "Right Coronary Artery 2", "RIght Coronary Artery 3"]
So I need to find a way to sort, ignoring the double blanks in the first item, and the uppercase "I" in the last item.
I tried the following sorting mechanisms:
Plain sorting
sortedList = sorted(listToBeSorted)
will produce:
['RIght Coronary Artery 3',
'Right Coronary Artery 2',
'Right Coronary Artery 1']
Sorting, ignoring case:
sortedList = sorted(listToBeSorted, key=str.casefold)
will produce:
['Right Coronary Artery 2',
'Right Coronary Artery 1',
'RIght Coronary Artery 3']
Sorting, eliminating all blanks
sortedList = sorted(listToBeSorted, key=lambda x: ''.join(x.split()))
will produce:
['RIght Coronary Artery 3',
'Right Coronary Artery 1',
'Right Coronary Artery 2']
I cannot change the entries themselves, as I need them to access the items in a dictionary later.
I eventually converted the list entries into a tuple, added an uppercase version without blanks, and sorted the list by the 2nd element of the tuple:
sortedListWithTwin = []
# Add an uppercase "twin" without whitespaces
for item in listToBeSorted:
sortString = (item.upper()).replace(" ","")
sortedListWithTwin.append((item, sortString))
# Sort list by the new "twin"
sortedListWithTwin.sort(key = lambda x: x[1])
# Remove the twin
sortedList = []
for item in sortedListWithTwin:
sortedList.append(item[0])
This will produce the desired order:
['Right Coronary Artery 1',
'Right Coronary Artery 2',
'RIght Coronary Artery 3']
However, this solution seems very cumbersome and inefficient. What would be a better way to solve this?
Upvotes: 3
Views: 328
Reputation: 1447
I'll give an alternative method, using PyICU (a Python wrapper for icu4c). ICU has quite a powerful and flexible Collator class to allow tailored sorting.
I will include two methods:
For the question solution, I would activate numeric collation, set collation strength to secondary (case distinctions are tertiary, so setting to secondary will give us a caseless sort). Set alternate handling to shifted, this will address the whitespace issue in the question.
Setting attributes on collator
import icu
listToBeSorted = ["Right Coronary Artery 2", "Right Coronary Artery 1", "RIght Coronary Artery 3"]
collator = icu.Collator.createInstance(icu.Locale.getRoot())
collator.setAttribute(icu.UCollAttribute.NUMERIC_COLLATION, icu.UCollAttributeValue.ON)
collator.setStrength(icu.UCollAttributeValue.SECONDARY)
collator.setAttribute(icu.UCollAttribute.ALTERNATE_HANDLING, icu.UCollAttributeValue.SHIFTED)
sorted(listToBeSorted, key=collator.getSortKey)
Creating locale from BCP-47 language tag
import icu
listToBeSorted = ["Right Coronary Artery 2", "Right Coronary Artery 1", "RIght Coronary Artery 3"]
lang = "en-AU-u-kn-true-ka-shifted-kv-space-ks-level2"
loc = icu.Locale.forLanguageTag(lang)
collator = icu.Collator.createInstance(loc)
sorted(listToBeSorted, key=collator2.getSortKey)
Both will result in ['Right Coronary Artery 1', 'Right Coronary Artery 2', 'RIght Coronary Artery 3']
In the BCP-47 version, I have restricted the alternative handling shift to just whitespace. Alternatively, punctuation, symbols and currency symbols could have been included.
Upvotes: 0
Reputation: 5741
sort using lambda
sortedList = sorted(listToBeSorted, key=lambda x: x.casefold().replace(" ", ""))
print(sortedList)
If you don't want to use replace
for some reason. You could even use regex.
re.sub()
function will replace all the whitespaces characters with an empty string. \s+
matches one or more consecutive whitespaces. Maintaining casefold()
function to ignore case.
import re
sortedList = sorted(listToBeSorted, key=lambda x: re.sub(r"\s+", "", x.casefold()))
print(sortedList)
Output:
['Right Coronary Artery 1',
'Right Coronary Artery 2',
'RIght Coronary Artery 3']
Upvotes: 4
Reputation: 27750
sortedList = sorted(listToBeSorted, key=lambda x: x.upper().replace(" ", ""))
print(sortedList)
print(sortedList)
#['Right Coronary Artery 1', 'Right Coronary Artery 2', 'RIght Coronary Artery 3']
Upvotes: -1