Reputation: 160
I'm trying to sort a list of tuples based on the last names:
names = [
(123, 'Active', 'Michael Wilson Blessing'),
(456, 'Active', 'Tim Weaver Beadle'),
(789, 'Active', 'Lois Alan Beadle'),
...
]
What I want to do is sort this list based on last name. To accomplish this, I used
otherlist = sorted(otherlist, key=lambda x: x[2])
which works for unique last names, but for duplicates such as 'Beale', it seems to not want to sort it how it should, where if there's a match on last name to utilize the first name (i.e., Michael Blessing, Lois Beadle, Time Beadle). Is there a way to configure this with just the lambda, or do I need to just create a function that accomplishes this, and what might that look like?
Upvotes: 1
Views: 240
Reputation: 36621
If you want to sort by last name, then first name, then middle name (if present).
This requires Python 3.8 or later for the :=
operator.
sorted(names, key=lambda x: [(n := x[2].split())[-1], *n[:-1]])
# [(789, 'Active', 'Lois Alan Beadle'),
# (456, 'Active', 'Tim Weaver Beadle'),
# (123, 'Active', 'Michael Wilson Blessing'),
# (222, 'Active', 'Asha Baron Cohen'),
# (111, 'Active', 'Sasha Baron Cohen'),
# (999, 'Active', 'Sasha Cohen Cohen')]
This splits the name on whitespace, storing that in n
, and builds a list out of the last name, then any other names, in original order. These lists are compared to produce the sorted list.
I realize on refreshing the page that I've answered very similarly to Pranav. However, this does factor in all other names, and doesn't repeat a name if someone mononymous like Cher is in the list.
Upvotes: 1
Reputation: 25489
You could do something like this if you have a version of python that supports the walrus operator:
sorted(names, key=lambda x: ((s:=x[2].split())[-1], s[0]))
This assigns x[2].split()
to s
, selects the -1
th element of it, and also the 0
th element, puts them in a tuple, and returns the tuple. sorted
takes care of the rest.
However, I highly recommend against doing this. A regular function like @I'mahdi shows in their answer is much more readable, and no worse performance-wise. Here's a demonstration:
import timeit
import string
import random
from matplotlib import pyplot as plt
def randname():
fname = random.choice(string.ascii_uppercase) + "".join(random.sample(string.ascii_lowercase, 6))
mname = random.choice(string.ascii_uppercase) + "".join(random.sample(string.ascii_lowercase, 6))
lname = random.choice(string.ascii_uppercase) + "".join(random.sample(string.ascii_lowercase, 6))
return " ".join((fname, mname, lname))
def generate(m):
active = "Active"
return [(000, active, randname()) for _ in range(m)]
def sort_name(x):
part_name = x[2].split()
return (part_name[-1], part_name[0])
def func_sort(names):
return sorted(names, key=sort_name)
def lambda_sort(names):
return sorted(names, key=lambda x: ((s:=x[2].split())[-1], s[0]))
mvals = [10, 50, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10_000]
lsort = []
fsort = []
for m in mvals:
names = generate(m)
fsort.append(timeit.timeit("func_sort(names)", number=100, globals=globals()))
lsort.append(timeit.timeit("lambda_sort(names)", number=100, globals=globals()))
plt.figure()
plt.plot(mvals, lsort, label="key=lambda_sort")
plt.plot(mvals, fsort, label="key=func_sort")
plt.legend()
plt.grid(True)
plt.xlabel("len(names)")
plt.ylabel("Time for 100 calls (s)")
ax.set_xscale("log")
plt.tight_layout()
The resulting plot shows that the performance of lambda_sort
is basically identical to that of func_sort
.
Upvotes: 2
Reputation: 24059
You can define a function and sort based on the last and first part of each name.
names = [
(222, 'Active', 'Asha Baron Cohen'),
(111, 'Active', 'Sasha Baron Cohen'),
(123, 'Active', 'Michael Wilson Blessing'),
(456, 'Active', 'Tim Weaver Beadle'),
(789, 'Active', 'Lois Alan Beadle')
]
def sort_name(x):
part_name = x[2].split()
# Below line first sort base last part of the name and then check the first part of each name (for duplicated in the last part below line check the first part of each name)
return (part_name[-1], part_name[0])
# You can also consider second (middle) part of name like below
# return (part_name[-2:], part_name[0])
# the result will change if you use above approach
names.sort(key = sort_name)
print(names)
Output:
[(789, 'Active', 'Lois Alan Beadle'),
(222, 'Active', 'Asha Baron Cohen'),
(111, 'Active', 'Sasha Baron Cohen'),
(456, 'Active', 'Tim Weaver Beadle'),
(123, 'Active', 'Michael Wilson Blessing')]
Upvotes: 2