Reputation: 3379
Say I have the following lists:
List1=['Name1','Name3','Color1','Size2','Color3','Color2','Name2','Size1', 'ID']
List2=['ID','Color1','Color2','Size1','Size2','Name1','Name2']
Each list will have element named "ID" variable and then 3 other categories (Name, Color, and Size) of which there is an unpredetermined number of elements in each category.
I want to sort these variables without knowing how many there will be in each category with the following 'sort list':
SortList=['ID','Name','Size','Color']
I can get the desired output (see below) although I imagine there is a better / more pythonic way of doing so.
>>> def SortMyList(MyList,SortList):
... SortedList=[]
... for SortItem in SortList:
... SortItemList=[]
... for Item in MyList:
... ItemWithoutNum="".join([char for char in Item if char.isalpha()])
... if SortItem==ItemWithoutNum:
... SortItemList.append(Item)
... if len(SortItemList)>1:
... SortItemList=[SortItem+str(I) for I in range(1,len(SortItemList)+1)]
... for SortedItem in SortItemList:
... SortedList.append(SortedItem)
... return SortedList
...
>>>
>>> SortMyList(List1, SortList)
['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
>>> SortMyList(List2, SortList)
['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']
>>>
Any suggestions as to how my methodology or my code can be improved?
Upvotes: 6
Views: 1789
Reputation: 5606
Is there (in this case) easier way to extract data from string than simple regexes?
import re
def keygen(sort_list):
return lambda elem: (
sort_list.index(re.findall(r'^[a-zA-Z]+', elem)[0]),
re.findall(r'\d+$', elem)
)
SortList = ['ID', 'Name', 'Size', 'Color']
List1 = ['Name1', 'Name3', 'Color1', 'Size2', 'Color3', 'Color2','Name2', 'Size1', 'ID']
List2 = ['ID', 'Color1', 'Color2', 'Size1', 'Size2', 'Name1', 'Name2']
sorted(List1, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
sorted(List2, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']
^[a-zA-Z]+
matches alphabetic part at the beggining, and \d$
– numeric part at the end of string.
keygen
returns lambda
that takes a string, and returns two-item tuple:
first item is position of alphabetic part in the list (no such item in list = ValueError
),
second is one-item list containing numeric part at the end, or empty list if string doesn't end with digit.
sort_list.index
call is O(n)
, and it will be called for each element in list; can be replaced with O(1)
dict lookup to speed sorting up (I didn't do that to keep things simple),1 < 2 < 10
, but '1' < '10' < '2'
)After applying those:
import re
def keygen(sort_list):
index = {(word, index) for index, word in enumerate(sort_slist)}
return lambda elem: (
index[re.findall(r'^[a-zA-Z]+', elem)[0]],
[int(s) for s in re.findall(r'\d+$', elem)]
)
Upvotes: 0
Reputation: 2662
This works as long as you know that List2 only contains strings that starts with things in sortList
List2=['ID','Color4','Color2','Size1','Size2','Name2','Name1']
sortList=['ID','Name','Size','Color']
def sort_fun(x):
for i, thing in enumerate(sortList):
if x.startswith(thing):
return (i, x[len(thing):])
print sorted(List2, key=sort_fun)
Upvotes: 2
Reputation: 18628
You can just provide the adequate key :
List1.sort( key = lambda x : ('INSC'.index(x[0]),x[-1]))
# ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
The elements will be sorted by the first letter then the last digit if exists. It works here because all first letters are different and if numbers have at most one digit.
EDIT
for many digits, a more obfuscated solution:
List1.sort( key =lambda x : ('INSC'.index(x[0]),int("0"+"".join(re.findall('\d+',x)))))
# ['ID', 'Name1', 'Name2', 'Name10', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
Upvotes: 1
Reputation: 64318
You can sort the list using a custom key function, which returns a 2-tuple, for primary sorting and secondary sorting.
Primary sorting is by the order of your "tags" (ID first, then Name, etc.). Secondary sorting is by the numeric value following it.
tags = ['ID','Name','Size','Color']
sort_order = { tag : i for i,tag in enumerate(tags) }
def elem_key(x):
for tag in tags:
if x.startswith(tag):
suffix = x[len(tag) : ]
return ( sort_order[tag],
int(suffix) if suffix else None )
raise ValueError("element %s is not prefixed by a known tag. order is not defined" % x)
list1.sort(key = elem_key)
Upvotes: 5