Reputation: 371
I am writing a code that returns the most common bus stop number in a bus route (from a csv file). The first entry of 'bus_routes.csv' of each row is the name of the route, while rest of the entries are the bus stop numbers.
def most_common_number(routes):
routes = load_routes('bus_routes.csv')
bus_stop_ID = [col[1:] for col in routes]
return max(sum(set((bus_stop_ID, []), key = sum(bus_stop_ID, []).count)
When I run this code, it returns an error; TypeError: can only concatenate list (not "tuple") to list type(bus_stop_number) is a list, so I cannot find where a tuple exists in my last code.
Upvotes: 3
Views: 152
Reputation: 500713
I cannot find where a tuple exists in my last code
Each element of bus_stop_ID
is tuple.
To recap, you have a list of variable-length tuples and you'd like to find the most common element across that entire structure.
This sounds like a perfect use for collections.Counter
:
>>> from collections import Counter
>>> bus_stop_ID = [('id1', 'id2', 'id3'), ('id1',), ('id1', 'id3', 'id5')]
>>> [(stop_id, count)] = sum(map(Counter, bus_stop_ID), Counter()).most_common(1)
>>> stop_id
'id1'
>>> count
3
Here:
map(Counter, bus_stop_ID)
counts bus stop ids in every route separately;sum(..., Counter())
aggregates the counts across routes;.most_common(1)
returns the most common bus stop id in the aggregated counts (as a single-element list containing a 2-tuple that consists of a bus stop id and its count).An alternative is to fist concatenate all the routes (using sum()
) and then do the counting:
>>> [(stop_id, count)] = Counter(sum(bus_stop_ID, ())).most_common(1)
This is shorter but less efficient since it needs to concatenate all the routes into a single (potentially giant) tuple.
A — somewhat fancy — way to sidestep this inefficiency is by using itertools.chain()
:
>>> Counter(itertools.chain(*bus_stop_ID)).most_common(1)
[('id1', 3)]
This simply iterates over all bus stop ids in the first route, then the second and so on, doing the counting along the way.
Out of curiosity, I benchmarked all three methods on an arbitrarily chosen list of 10,000 tuples and the results are quite interesting:
In [11]: %timeit sum(map(Counter, bus_stop_ID), Counter()).most_common(1)
1 loop, best of 3: 235 ms per loop
In [12]: %timeit Counter(sum(bus_stop_ID, ())).most_common(1)
1 loop, best of 3: 2.64 s per loop
In [13]: %timeit Counter(itertools.chain(*bus_stop_ID)).most_common(1)
10 loops, best of 3: 17.3 ms per loop
Upvotes: 4