Reputation: 8551
I have a few lists, each containing several cities. I need to check for any two random elements if they belong to the same list.
Simple example:
list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon', ...]
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston', ...]
Expected results:
> in_same_group('London', 'Liverpool')
> True
>
> in_same_group('Berlin', 'Washington')
> False
The function is called very often, so speed is critical. The biggest list might have up to 1000 elements.
What would be the most efficient way to do it?
This is what I tried so far, but it is far too slow:
def in_same_group(city1, city2):
same_group = False
for this_list in [list1, list2, list3...]:
if city1 in this_list and city2 in this_list:
return True
return same_group
Upvotes: 14
Views: 4029
Reputation: 91
In database terms, you have a one-to-many relationship. One list can contain many names, but each name can only appear in one list.
Try this:
In [20]: city_lists = {city_name:list1 for city_name in list1}
In [21]: city_lists.update({city_name:list2 for city_name in list2})
In [22]: city_lists
Out[22]:
{'Cork': ['Dublin', 'Cork', 'Galway'],
'Dublin': ['Dublin', 'Cork', 'Galway'],
'Edimburgh': ['London', 'Manchester', 'Liverpool', 'Edimburgh'],
'Galway': ['Dublin', 'Cork', 'Galway'],
'Liverpool': ['London', 'Manchester', 'Liverpool', 'Edimburgh'],
'London': ['London', 'Manchester', 'Liverpool', 'Edimburgh'],
'Manchester': ['London', 'Manchester', 'Liverpool', 'Edimburgh']}
In [23]: city_lists['Cork'] is city_lists['Dublin']
Out[23]: True
In [24]: city_lists['Cork'] is city_lists['London']
Out[24]: False
This is very efficient. If you visualize the code with http://pythontutor.com you will see the dict contains references to the original lists, but the lists themselves are not copied.
Upvotes: 0
Reputation: 8551
After running some speed tests with the proposals, I have realized that a weak point about the solutions proposed, if I am not mistaken, is that the complete loop will always be done for the mapping. By skipping the loop earlier in case the result is already known, speed could be improved.
In addition, if one list is much larger than all the others, there is some potential to speed things up.
I think the following could be faster, in case one of the lists is much larger than the others. The assumption is that list3 is much larger than the others.
list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon', ...] #assuming list3 is much larger than all others
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston', ...]
A possibility would be:
def in_same_group(city1, city2):
for group in [list1, list2, list4]: #note that list3, the largest, is skipped
if city1 in group and city2 not in group:
return False
return True #if this point is reached, both cities belong to the biggest group
Upvotes: 0
Reputation: 54303
Here's a mix of Horia's proposal and my original one. You can define a dict
with cities as key and sets of indices as values:
list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway', 'Paris', 'Rome']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon']
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston']
# Note that 'Paris' and 'Rome' are both in list2 and list3
groups = [list1, list2, list3, list4]
indices = {}
for i, group in enumerate(groups):
for city in group:
indices.setdefault(city, set()).add(i)
The structure is compact and looks like this:
print(indices)
#{'London': {0}, 'Manchester': {0}, 'Liverpool': {0}, 'Edimburgh': {0}, 'Dublin': {1}, 'Cork': {1}, 'Galway': {1}, 'Paris': {1, 2}, 'Rome': {1, 2}, 'Berlin': {2}, 'Munich': {2}, 'Frankfurt': {2}, 'Milan': {2}, 'Madrid': {2}, 'Barcelona': {2}, 'Lisbon': {2}, 'Washington': {3}, 'New York': {3}, 'San Francisco': {3}, 'LA': {3}, 'Boston': {3}}
For any city pair, you can get a set of common indices thanks to set intersection:
def common_groups(city1, city2):
return indices.get(city1, set()) & indices.get(city2, set())
print(common_groups('London', 'Liverpool'))
# {0}
print(common_groups('London', 'Paris'))
# set()
print(common_groups('Cork', 'Paris'))
# {1}
print(common_groups('Rome', 'Paris'))
# {1, 2}
print(common_groups('Rome', 'Nowhere'))
# set()
An empty set is falsey in Python.
With n
cities, creating the dict will be O(n)
, space requirement should be O(n)
and lookup performance will be O(1)
. As a bonus, the query doesn't just return a boolean but a set of indices.
Finally, thanks to set intersections, this method would also work if you want to check that three or more cities are in the same group.
Upvotes: 21
Reputation: 146
import pandas as pd
list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon']
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston']
----------
a = pd.Series(list(list1))
b = pd.Series(list(list2))
c = pd.Series(list(list3))
d = pd.Series(list(list4))
lists = [a,b,c,d]
----------
for i in lists:
if (i.isin(['London']).any()) and (i.isin(['Manchester']).any()) == True:
print('Same Group')
else:
print('Different Group')
Same Group
Different Group
Different Group
Different Group
Upvotes: 0
Reputation: 149155
If you want it to be really fast, you should reverse your data structure to have a dictionary of sets where the key would be a town and the set would contain all towns in same group. That way you can make sure that in_same_group
will only need:
As those accesses are optimized for dictionaries and sets, the research should be as fast as it can be
Code could be:
import collections
h = collections.defaultdict(set)
lists = [list1, list2, list3, list4]
for l in lists:
for town in l:
for other in l:
if town != other:
h[town].add(other)
The function is now as simple as:
def in_same_group(t1, t2):
return t2 in h[t1]
Upvotes: 1
Reputation: 532268
First, use sets, not lists (and use a list of sets instead of separate variables).
master_list = []
master_list.append(set(['London', 'Manchester', 'Liverpool', 'Edimburgh']))
master_list.append(set(['Dublin', 'Cork', 'Galway']))
master_list.append(set(['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon', ...]))
master_list.append(set(['Washington', 'New York', 'San Francisco', 'LA', 'Boston', ...]))
(Depending on your use case, a dict with more meaningful keys may be more appropriate than a list.)
Second, build a dict that maps each element to its set:
# E.g., index['London'] == set(['London', 'Manchester', ...])
index = dict((item, s) for s in master_list for item in s)
Now, you just need to check if both items belong to the same set.
def in_same_group(i1, i2):
return index[i1] is index[i2]
Upvotes: 5
Reputation: 8781
One approach would be to build a map from the city to it's group number. So you'd build something like:
mapping = {
'London': 1,
...,
'Berlin': 3
...
}
Then your in_same_group
function can be:
def in_same_group(item1, item2):
gr1 = mapping[item1]
gr2 = mapping[item2]
return gr1 == gr2
In terms of speed, this is quite fast, as it's just two dictionary lookups, which are very fast in Python and one comparison, which is again, quite fast. In big-Oh the function is O(1)
.
But it assumes the element are only part of one group. Which seems to be the case in the example you provided.
You do have to spend the extra time and memory of actually building the map. But it's going to be amortized over all the calls to in_same_group
. OTOH, you probably wouldn't get away with building an indexing structure regardless of your approach.
The code to build the mapping would be:
def build_mapping(groups):
mapping = {}
for i in range(0, len(groups)):
for g in groups[i]:
mapping[g] = i
return mapping
It's not the prettiest code but it gets the job done.
Upvotes: 8
Reputation: 71471
You can iterate through the lists and determine if both search queries are found in it. Then, return the boolean value of the newly formed list.
def search(s1, s2):
list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon']
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston']
return bool([i for i in [list1, list2, list3, list4] if s1 in i and s2 in i])
Upvotes: 2