Reputation: 111
I have been working on an assignment gathering data, and counting how many times each thing appears from a big dataset about 500mb. I have a couple of dictionaries reading csv files and putting data together and my final dict looks like this after all of the data has been gathered and worked on.
I am almost done with the assigment but am stuck on this section, I need to find the top 5 max values between all keys and values.
I have the following dictionary:
printed using: print key, task1[key]
KEY KEYVALUE
WA [[('1082225', 29), ('845195', 21), ('265021', 17)]]
DE [[('922397', 44), ('627084', 40), ('627297', 14)]]
DC [[('774648', 17), ('911624', 17), ('771241', 16)]]
WI [[('12618', 25), ('242582', 23), ('508727', 22)]]
WV [[('476050', 4), ('1016620', 3), ('769611', 3)]]
HI [[('466263', 5), ('226000', 5), ('13694', 4)]]
I pretty much need to go through and find the top 5 values and their ID number. for example
What would be the best way to do this?
**EDIT how i am putting together my task dictionary
task1 = {}
for key,val in courses.items():
task1[key] = [sorted(courses[key].iteritems(), key=operator.itemgetter(1), reverse=True)[:5]]
Upvotes: 0
Views: 111
Reputation: 455
Assuming your dict
looks something like:
mydict = {'WA': [('1082225', 29), ('845195', 21), ('265021', 17)],
'DE': [('922397', 44), ('627084', 40), ('627297', 14)],
...}
This is not the ideal representation. If you run this, you can flatten the list into a better format:
data = [(k, idnum, v) for k, kvlist in mydict.items() for idnum, v in kvlist]
Now the data will look like:
[('WA', '1082225', 29),
('WA', '845195', 21),
('WA', '265021', 17),
('DE', '922397', 44),
...]
In this format, the data is clearly readable, and it is obvious what we need to search. This line will sort the new tuples in descending order according to their [2]
value:
sorted(data, key=lambda x: x[2], reverse=True)
Note: the dictionary you provided has an unnecessary []
, so I removed that from the answer for clarity.
Edited after clarification.
Upvotes: 2