Reputation: 121
Working on a project for CS1, and I am close to cracking it, but this part of the code has stumped me! The object of the project is to create a list of the top 20 names in any given year by referencing a file with thousands of names on it. Each line in each file contains the name, gender, and how many times it occurs. This file is seperated by gender (so female names in order of their occurences followed by male names in order of their occurences). I have gotten the code to a point where each entry is contained within a class in a list (so this list is a long list of memory entries). Here is the code I have up to this point.
class entry():
__slots__ = ('name' , 'sex' , 'occ')
def mkEntry( name, sex, occ ):
dat = entry()
dat.name = name
dat.sex = sex
dat.occ = occ
return dat
##test = mkEntry('Mary', 'F', '7065')
##print(test.name, test.sex, test.occ)
def readFile(fileName):
fullset = []
for line in open(fileName):
val = line.split(",")
sett = mkEntry(val[0] , val[1] , int(val[2]))
fullset.append(sett)
return fullset
fullset = readFile("names/yob1880.txt")
print(fullset)
What I am wondering if I can do at this point is can I sort this list via usage of sort() or other functions, but sort the list by their occurrences (dat.occ in each entry) so in the end result I will have a list sorted independently of gender and then at that point I can print the first entries in the list, as they should be what I am seeking. Is it possible to sort the list like this?
Upvotes: 0
Views: 5059
Reputation: 77069
I think you just want to sort on the value of the 'occ' attribute of each object, right? You just need to use the key
keyword argument to any of the various ordering functions that Python has available. For example
getocc = lambda entry: entry.occ
sorted(fullset, key=getocc)
# or, for in-place sorting
fullset.sort(key=getocc)
or perhaps some may think it's more pythonic to use operator.attrgetter
instead of a custom lambda:
import operator
getocc = operator.attrgetter('occ')
sorted(fullset, key=getocc)
But it sounds like the list is pretty big. If you only want the first few entries in the list, sorting may be an unnecessarily expensive operation. For example, if you only want the first value you can get that in O(N) time:
min(fullset, key=getocc) # Same getocc as above
If you want the first three, say, you can use a heap instead of sorting.
import heapq
heapq.nsmallest(3, fullset, key=getocc)
A heap is a useful data structure for getting a slice of ordered elements from a list without sorting the whole list. The above is equivalent to sorted(fullset, key=getocc)[:3]
, but faster if the list is large.
Hopefully it's obvious you can get the three largest with heapq.nlargest
and the same arguments. Likewise you can reverse any of the sorts or replace min
with max
.
Upvotes: 0
Reputation: 22882
Yes, you can sort lists of objects using sort()
. sort()
takes a function as an optional argument key
. The key
function is applied to each element in the list before making the comparisons. For example, if you wanted to sort a list of integers by their absolute value, you could do the following
>>> a = [-5, 4, 6, -2, 3, 1]
>>> a.sort(key=abs)
>>> a
[1, -2, 3, 4, -5, 6]
In your case, you need a custom key
that will extract the number of occurrences for each object, e.g.
def get_occ(d): return d.occ
fullset.sort(key=get_occ)
(you could also do this using an anonymous function: fullset.sort(key=lambda d: d.occ)
). Then you just need to extract the top 20 elements from this list.
Note that by default sort
returns elements in ascending order, which you can manipulate e.g. fullset.sort(key=get_occ, reverse=True)
Upvotes: 2
Reputation: 31
You mean you want to sort the list only by the occ? sort() has a parameter named key
, you can do like this:
fullset.sort(key=lambda x: x.occ)
Upvotes: 0
Reputation: 387557
This sorts the list by using the occ
property in descending order:
fullset.sort(key=lambda x: x.occ, reverse=True)
Upvotes: 0