Reputation: 129
I have a python list like this:
Category Title ProductId Rating
'Electronics, Books, Bundles' Lautner e-Reader Cover 161553 4
'Electronics, Books, Bundles' Lautner stand in e-Reader Cover 161552 3
'Electronics, Books, Bundles' Lautner Chocolate NOOK Case 594451 5
'Electronics, Books, Bundles' Oliver e-Reader Cover 161685 1
'Electronics, Books, Covers' Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033 4.3
'Electronics, Books, Covers' Emerson Quote e-Reader Cover 161542 2.8
'Electronics, Books, Covers' Industriell Easel e-Reader Cover 161682 3.7
'Electronics, Books, Covers' Jonathan Adler Book Reader Cover Hd - Elephant 594548 4.9
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR 161683 4
'Electronics, Scanners, Covers' Nook Tablet Dessin Cover in Marine 161686 3.8
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red 594202 4.2
'Electronics, Scanners, Covers' Canvas Bella Library Cover 161554 3
'Electronics, Books, Radios' Groovy Protective Stand Cover: Custom Designed for 7-inch NOOK HD 594549 3.8
'Electronics, Books, Radios' Hd Groovy Stand In Blue- Nook 594514 4.1
'Electronics, Books, Radios' Hutton Envelope in Bark 161560 2.9
'Electronics, Books, Radios' Italian Leather-Style Chesterton Cover for NOOK Reader 161561 4
Out of all these list values, I want top k from each category. Top 2 should give the below result:
'Electronics, Books, Bundles' Lautner Chocolate NOOK Case 594451 5
'Electronics, Books, Bundles' Lautner e-Reader Cover 161553 4
'Electronics, Books, Covers' Jonathan Adler Book Reader Cover Hd - Elephant 594548 4.9
'Electronics, Books, Covers' Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033 4.3
'Electronics, Books, Radios' Hd Groovy Stand In Blue- Nook 594514 4.1
'Electronics, Books, Radios' Italian Leather-Style Chesterton Cover for NOOK Reader 161561 4
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red 594202 4.2
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR 161683 4
Adding whatever I have tried:
sorted_data = sorted(data, key=operator.itemgetter(1), reverse=True)
k = int(sys.argv[1])
for result in sorted_data[:k]:
print result
Here I am passing 'k' as a command line argument to the python file.
Upvotes: 2
Views: 12464
Reputation: 11
Assume you are looking for something similar to this. This is the code.
Your list is too long. That is why I used a simple list here. This is the result that I got.
Upvotes: 0
Reputation: 3189
Using iterators and the like, you can get relatively efficient performance. Note: This uses the standard Python library.
import heapq
import itertools
# group by 'Category'
groups = itertools.groupby(some_list, key=lambda element: element[0])
# take top two of each group based on 'Rating'
top_two_of_each = (heapq.nlargest(2, values, key=lambda value: value[3]) for
_, values in groups)
# flatten the nested iterators
top_two_of_each_flattened = itertools.chain(*top_two_of_each)
# convert iterator into a list
top_two_of_each_flattened_as_list = list(top_two_of_each_flattened)
Upvotes: 4
Reputation: 1821
Probably not an efficient but understandable solution:
You want the top results per element, so first we need to identify the elements. We do this by splitting at '
as this is the easiest indicator, the empty string from the first '
will be discarded ([1:]).
separated = [element.split("'")[1:] for element in data]
As we are interested in items identified by the first string a dictionary seems like a suitable data structure.
from collections import defaultdict
data_dict = defaultdict(list)
for line in separated:
data_dict[line[0]].append(line)
Now we have a nice format and can sort the dictonary.
for key in data_dict.keys(): data_dict[key].sort(key=lambda key_string: key_string.split()[-1], reverse=True)
From this dictionary it is easy to reproduce our results:
k = 2
results = []
for key in data_dict.keys():
results.extend(data_dict[key][:k])
The key is to use a suitable data structure, here a dictionary. Here the short solution:
# make a dict
from collections import defaultdict
data_dict = defaultdict(list)
for line in data:
data_dict[line.split("'")[1]].append(line)
# function working on the dict:
def top_results(data_dict, k):
results = []
for key in data_dict.keys():
results.extend(data_dict[key][:k])
return results
But it is likely more suitable to keep working with an dictionary instead of returning an unsuitable list.
To summarize:
dict
fits.split("'")
works for thislist.sort
. A key
is need, here we use just the last word str.split()[-1]
, as this is your ranking.Upvotes: 1
Reputation: 27869
This might be what you need:
data = ''''Electronics, Books, Bundles' Lautner e-Reader Cover 161553 4
'Electronics, Books, Bundles' Lautner stand in e-Reader Cover 161552 3
'Electronics, Books, Bundles' Lautner Chocolate NOOK Case 594451 5
'Electronics, Books, Bundles' Oliver e-Reader Cover 161685 1
'Electronics, Books, Covers' Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033 4.3
'Electronics, Books, Covers' Emerson Quote e-Reader Cover 161542 2.8
'Electronics, Books, Covers' Industriell Easel e-Reader Cover 161682 3.7
'Electronics, Books, Covers' Jonathan Adler Book Reader Cover Hd - Elephant 594548 4.9
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR 161683 4
'Electronics, Scanners, Covers' Nook Tablet Dessin Cover in Marine 161686 3.8
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red 594202 4.2
'Electronics, Scanners, Covers' Canvas Bella Library Cover 161554 3
'Electronics, Books, Radios' Groovy Protective Stand Cover: Custom Designed for 7-inch NOOK HD 594549 3.8
'Electronics, Books, Radios' Hd Groovy Stand In Blue- Nook 594514 4.1
'Electronics, Books, Radios' Hutton Envelope in Bark 161560 2.9
'Electronics, Books, Radios' Italian Leather-Style Chesterton Cover for NOOK Reader 161561 4'''
groups = [item.split("' ") for item in data.split('\n')]
grouped_data = {}
for group in groups:
item = [group[1].strip()]
group = group[0].strip("'")
if group not in grouped_data:
grouped_data[group] = item
else:
grouped_data[group] += item
def topN(data, n):
data = [item.split() for item in data]
data = sorted(data, key=lambda x: float(x[-1]), reverse=True)[:n]
data = [' '.join(item) for item in data]
return data
result = {}
for k, v in grouped_data.items():
result[k] = topN(v, 2)
final_result = [': '.join([group1, item1]) for group1, value1 in result.items() for item1 in value1]
Upvotes: 1