ZapRowsdower
ZapRowsdower

Reputation: 160

Sort a list of tuples based on name

I'm trying to sort a list of tuples based on the last names:

names = [
(123, 'Active', 'Michael Wilson Blessing'),
(456, 'Active', 'Tim Weaver Beadle'),
(789, 'Active', 'Lois Alan Beadle'),
...
]

What I want to do is sort this list based on last name. To accomplish this, I used

otherlist = sorted(otherlist, key=lambda x: x[2])

which works for unique last names, but for duplicates such as 'Beale', it seems to not want to sort it how it should, where if there's a match on last name to utilize the first name (i.e., Michael Blessing, Lois Beadle, Time Beadle). Is there a way to configure this with just the lambda, or do I need to just create a function that accomplishes this, and what might that look like?

Upvotes: 1

Views: 240

Answers (3)

Chris
Chris

Reputation: 36621

If you want to sort by last name, then first name, then middle name (if present).

This requires Python 3.8 or later for the := operator.

sorted(names, key=lambda x: [(n := x[2].split())[-1], *n[:-1]])
# [(789, 'Active', 'Lois Alan Beadle'), 
#  (456, 'Active', 'Tim Weaver Beadle'), 
#  (123, 'Active', 'Michael Wilson Blessing'), 
#  (222, 'Active', 'Asha Baron Cohen'), 
#  (111, 'Active', 'Sasha Baron Cohen'), 
#  (999, 'Active', 'Sasha Cohen Cohen')]

This splits the name on whitespace, storing that in n, and builds a list out of the last name, then any other names, in original order. These lists are compared to produce the sorted list.

I realize on refreshing the page that I've answered very similarly to Pranav. However, this does factor in all other names, and doesn't repeat a name if someone mononymous like Cher is in the list.

Upvotes: 1

pho
pho

Reputation: 25489

You could do something like this if you have a version of python that supports the walrus operator:

sorted(names, key=lambda x: ((s:=x[2].split())[-1], s[0]))

This assigns x[2].split() to s, selects the -1th element of it, and also the 0th element, puts them in a tuple, and returns the tuple. sorted takes care of the rest.

However, I highly recommend against doing this. A regular function like @I'mahdi shows in their answer is much more readable, and no worse performance-wise. Here's a demonstration:

import timeit
import string
import random

from matplotlib import pyplot as plt

def randname():
    fname = random.choice(string.ascii_uppercase) + "".join(random.sample(string.ascii_lowercase, 6))
    mname = random.choice(string.ascii_uppercase) + "".join(random.sample(string.ascii_lowercase, 6))
    lname = random.choice(string.ascii_uppercase) + "".join(random.sample(string.ascii_lowercase, 6))
    return " ".join((fname, mname, lname))

def generate(m):
    active = "Active"
    return [(000, active, randname()) for _ in range(m)]

def sort_name(x):
    part_name = x[2].split()
    return (part_name[-1], part_name[0])

def func_sort(names):
    return sorted(names, key=sort_name)

def lambda_sort(names):
    return sorted(names, key=lambda x: ((s:=x[2].split())[-1], s[0]))

mvals = [10, 50, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10_000]
lsort = []
fsort = []

for m in mvals:
    names = generate(m)
    fsort.append(timeit.timeit("func_sort(names)", number=100, globals=globals()))
    lsort.append(timeit.timeit("lambda_sort(names)", number=100, globals=globals()))
    

plt.figure()
plt.plot(mvals, lsort, label="key=lambda_sort")
plt.plot(mvals, fsort, label="key=func_sort")
plt.legend()
plt.grid(True)
plt.xlabel("len(names)")
plt.ylabel("Time for 100 calls (s)")
ax.set_xscale("log")
plt.tight_layout()

The resulting plot shows that the performance of lambda_sort is basically identical to that of func_sort.

enter image description here

Upvotes: 2

I'mahdi
I'mahdi

Reputation: 24059

You can define a function and sort based on the last and first part of each name.

names = [
(222, 'Active', 'Asha Baron Cohen'),
(111, 'Active', 'Sasha Baron Cohen'),
(123, 'Active', 'Michael Wilson Blessing'),
(456, 'Active', 'Tim Weaver Beadle'),
(789, 'Active', 'Lois Alan Beadle')
]

def sort_name(x):
    part_name = x[2].split()
    # Below line first sort base last part of the name and then check the first part of each name (for duplicated in the last part below line check the first part of each name)
    return (part_name[-1], part_name[0])

    # You can also consider second (middle) part of name like below
    # return (part_name[-2:], part_name[0])
    # the result will change if you use above approach


names.sort(key = sort_name)

print(names)

Output:

[(789, 'Active', 'Lois Alan Beadle'),
 (222, 'Active', 'Asha Baron Cohen'),
 (111, 'Active', 'Sasha Baron Cohen'),
 (456, 'Active', 'Tim Weaver Beadle'),
 (123, 'Active', 'Michael Wilson Blessing')]

Upvotes: 2

Related Questions