AJG519
AJG519

Reputation: 3379

Sort strings in Python list using another list

Say I have the following lists:

List1=['Name1','Name3','Color1','Size2','Color3','Color2','Name2','Size1', 'ID']
List2=['ID','Color1','Color2','Size1','Size2','Name1','Name2']

Each list will have element named "ID" variable and then 3 other categories (Name, Color, and Size) of which there is an unpredetermined number of elements in each category.

I want to sort these variables without knowing how many there will be in each category with the following 'sort list':

SortList=['ID','Name','Size','Color']

I can get the desired output (see below) although I imagine there is a better / more pythonic way of doing so.

>>> def SortMyList(MyList,SortList):       
...     SortedList=[]       
...     for SortItem in SortList:
...         SortItemList=[]
...         for Item in MyList:
...             ItemWithoutNum="".join([char for char in Item if char.isalpha()])  
...             if SortItem==ItemWithoutNum:
...                 SortItemList.append(Item)
...         if len(SortItemList)>1:
...             SortItemList=[SortItem+str(I) for I in range(1,len(SortItemList)+1)]
...         for SortedItem in SortItemList:
...             SortedList.append(SortedItem)
...     return SortedList
... 
>>> 
>>> SortMyList(List1, SortList)
['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
>>> SortMyList(List2, SortList)
['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']
>>> 

Any suggestions as to how my methodology or my code can be improved?

Upvotes: 6

Views: 1789

Answers (4)

GingerPlusPlus
GingerPlusPlus

Reputation: 5606

Is there (in this case) easier way to extract data from string than simple regexes?

import re

def keygen(sort_list):
    return lambda elem: (
        sort_list.index(re.findall(r'^[a-zA-Z]+', elem)[0]),
        re.findall(r'\d+$', elem)
    )

Usage:

   SortList = ['ID', 'Name', 'Size', 'Color']
   List1 = ['Name1', 'Name3', 'Color1', 'Size2', 'Color3', 'Color2','Name2', 'Size1', 'ID']
   List2 = ['ID', 'Color1', 'Color2', 'Size1', 'Size2', 'Name1', 'Name2']
   sorted(List1, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
   sorted(List2, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']

Explanation:

^[a-zA-Z]+ matches alphabetic part at the beggining, and \d$ – numeric part at the end of string.

keygen returns lambda that takes a string, and returns two-item tuple:
first item is position of alphabetic part in the list (no such item in list = ValueError),
second is one-item list containing numeric part at the end, or empty list if string doesn't end with digit.

Some possible improvements:

  • sort_list.index call is O(n), and it will be called for each element in list; can be replaced with O(1) dict lookup to speed sorting up (I didn't do that to keep things simple),
  • numeric part can be convered into actual integers (1 < 2 < 10, but '1' < '10' < '2')

After applying those:

import re

def keygen(sort_list):
    index = {(word, index) for index, word in enumerate(sort_slist)}
    return lambda elem: (
        index[re.findall(r'^[a-zA-Z]+', elem)[0]],
        [int(s) for s in re.findall(r'\d+$', elem)]
    )

Upvotes: 0

Garrett R
Garrett R

Reputation: 2662

This works as long as you know that List2 only contains strings that starts with things in sortList

List2=['ID','Color4','Color2','Size1','Size2','Name2','Name1']
sortList=['ID','Name','Size','Color']
def sort_fun(x):
    for i, thing in enumerate(sortList):
        if x.startswith(thing):
            return (i, x[len(thing):])

print sorted(List2, key=sort_fun)

Upvotes: 2

B. M.
B. M.

Reputation: 18628

You can just provide the adequate key :

List1.sort( key = lambda x : ('INSC'.index(x[0]),x[-1]))
# ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

The elements will be sorted by the first letter then the last digit if exists. It works here because all first letters are different and if numbers have at most one digit.

EDIT

for many digits, a more obfuscated solution:

List1.sort( key =lambda x : ('INSC'.index(x[0]),int("0"+"".join(re.findall('\d+',x)))))
 # ['ID', 'Name1', 'Name2', 'Name10', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

Upvotes: 1

shx2
shx2

Reputation: 64318

You can sort the list using a custom key function, which returns a 2-tuple, for primary sorting and secondary sorting.

Primary sorting is by the order of your "tags" (ID first, then Name, etc.). Secondary sorting is by the numeric value following it.

tags = ['ID','Name','Size','Color']
sort_order = { tag : i for i,tag in enumerate(tags) }

def elem_key(x):
    for tag in tags:
        if x.startswith(tag):
            suffix = x[len(tag) : ]
            return ( sort_order[tag],
                     int(suffix) if suffix else None )
    raise ValueError("element %s is not prefixed by a known tag. order is not defined" % x)

list1.sort(key = elem_key)

Upvotes: 5

Related Questions