Reputation: 527
I'm interested in a neat (+ optimal) way to solve the following problem, as I'm curious how to execute this in a Pythonic way. While doing a separate task, I attempted to make a collections.Counter
dictionary with arbitrary elements:
d = Counter({('A','C'): 4, ('B','D','E'): 3, ..., ('A','G','V','X','Z'): 1, ('L','Z'): 1})
Here, each element is a tuple, consisting of 2 or more characters. My goal is to create a list with the most frequent items with duplicates uniquely by the number of elements in a tuple. For example, the solution may look like the following:
d_sample = Counter({('A', 'C'): 4, ('B', 'D', 'E'): 3, ('A' ,'D'): 3, ('C', 'D', 'E'): 3,
('A', 'B', 'C', 'D', 'E'): 2, ('A', 'C', 'D', 'E'): 1,
('B', 'C', 'D', 'E'): 1, ('D', 'E'): 1})
result = [('A', 'C'), ('B', 'D', 'E'), ('C', 'D', 'E'), ('A', 'B', 'C', 'D', 'E'),
('A', 'C', 'D', 'E'), ('B', 'C', 'D', 'E')]
Here, ('A', 'C')
is selected, as it is the tuple with two elements with the highest count. ('B', 'D', 'E')
and ('C', 'D', 'E')
are both selected as they are the tuples with three elements with the highest count. Same goes for other elements in the result
list.
The method I thought of was using a for
loop to extract several lists from d
, separated by number of elements: for i in range(2, # maximum elements using another for loop)
. From there, I counted the item(s) with the highest frequency using another for
loop and max()
.
# pseudo-example:
maxval = 2
for x in d:
if len(x[0]) > maxval:
maxval = len(x[0])
counter_list = [[] * n for n in range(maxval-2)]
for x in d:
counter_list[len(x[0])-2].append(x)
## selecting max per list
As elaborated, this doesn't seem Pythonic and optimal at all. Any insights to deliver the task more optimally would be appreciated.
Upvotes: 2
Views: 336
Reputation: 362557
This is O(n) and I don't think it will be possible to get asymptotically any better than that.
highest_count_for_length = {}
result = []
for tup, count in d.items():
try:
if highest_count_for_length[len(tup)] == count:
result.append(tup)
except KeyError:
# we haven't seen this length yet
highest_count_for_length[len(tup)] = count
result.append(tup)
It does rely on the fact that the input is already ordered by value.
Upvotes: 2