J0ANMM
J0ANMM

Reputation: 8551

How to find out if two elements belong to the same list

I have a few lists, each containing several cities. I need to check for any two random elements if they belong to the same list.

Simple example:

list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon', ...]
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston', ...]

Expected results:

> in_same_group('London', 'Liverpool')
> True
>
> in_same_group('Berlin', 'Washington')
> False

The function is called very often, so speed is critical. The biggest list might have up to 1000 elements.

What would be the most efficient way to do it?


This is what I tried so far, but it is far too slow:

def in_same_group(city1, city2):

    same_group = False
    for this_list in [list1, list2, list3...]:
        if city1 in this_list and city2 in this_list:
            return True

    return same_group

Upvotes: 14

Views: 4029

Answers (8)

Marc Poulin
Marc Poulin

Reputation: 91

In database terms, you have a one-to-many relationship. One list can contain many names, but each name can only appear in one list.

Try this:

In [20]: city_lists = {city_name:list1 for city_name in list1}
In [21]: city_lists.update({city_name:list2 for city_name in list2})
In [22]: city_lists
Out[22]: 
{'Cork': ['Dublin', 'Cork', 'Galway'],
 'Dublin': ['Dublin', 'Cork', 'Galway'],
 'Edimburgh': ['London', 'Manchester', 'Liverpool', 'Edimburgh'],
 'Galway': ['Dublin', 'Cork', 'Galway'],
 'Liverpool': ['London', 'Manchester', 'Liverpool', 'Edimburgh'],
 'London': ['London', 'Manchester', 'Liverpool', 'Edimburgh'],
 'Manchester': ['London', 'Manchester', 'Liverpool', 'Edimburgh']}
In [23]: city_lists['Cork'] is city_lists['Dublin']
Out[23]: True
In [24]: city_lists['Cork'] is city_lists['London']
Out[24]: False

This is very efficient. If you visualize the code with http://pythontutor.com you will see the dict contains references to the original lists, but the lists themselves are not copied.

Upvotes: 0

J0ANMM
J0ANMM

Reputation: 8551

After running some speed tests with the proposals, I have realized that a weak point about the solutions proposed, if I am not mistaken, is that the complete loop will always be done for the mapping. By skipping the loop earlier in case the result is already known, speed could be improved.

In addition, if one list is much larger than all the others, there is some potential to speed things up.

I think the following could be faster, in case one of the lists is much larger than the others. The assumption is that list3 is much larger than the others.

list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon', ...] #assuming list3 is much larger than all others
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston', ...]

A possibility would be:

def in_same_group(city1, city2):

    for group in [list1, list2, list4]: #note that list3, the largest, is skipped
            if city1 in group and city2 not in group:
                return False

    return True #if this point is reached, both cities belong to the biggest group

Upvotes: 0

Eric Duminil
Eric Duminil

Reputation: 54303

Dict of sets of indices

Here's a mix of Horia's proposal and my original one. You can define a dict with cities as key and sets of indices as values:

list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
list2 = ['Dublin', 'Cork', 'Galway', 'Paris', 'Rome']
list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon'] 
list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston']
# Note that 'Paris' and 'Rome' are both in list2 and list3

groups = [list1, list2, list3, list4]

indices = {}

for i, group in enumerate(groups):
    for city in group:
        indices.setdefault(city, set()).add(i)

The structure is compact and looks like this:

print(indices)
#{'London': {0}, 'Manchester': {0}, 'Liverpool': {0}, 'Edimburgh': {0}, 'Dublin': {1}, 'Cork': {1}, 'Galway': {1}, 'Paris': {1, 2}, 'Rome': {1, 2}, 'Berlin': {2}, 'Munich': {2}, 'Frankfurt': {2}, 'Milan': {2}, 'Madrid': {2}, 'Barcelona': {2}, 'Lisbon': {2}, 'Washington': {3}, 'New York': {3}, 'San Francisco': {3}, 'LA': {3}, 'Boston': {3}}

For any city pair, you can get a set of common indices thanks to set intersection:

def common_groups(city1, city2):
    return indices.get(city1, set()) & indices.get(city2, set())

print(common_groups('London', 'Liverpool'))
#  {0}
print(common_groups('London', 'Paris'))
#  set()
print(common_groups('Cork', 'Paris'))
# {1}
print(common_groups('Rome', 'Paris'))
# {1, 2}
print(common_groups('Rome', 'Nowhere'))
# set()

An empty set is falsey in Python.

With n cities, creating the dict will be O(n), space requirement should be O(n) and lookup performance will be O(1). As a bonus, the query doesn't just return a boolean but a set of indices.

Finally, thanks to set intersections, this method would also work if you want to check that three or more cities are in the same group.

Upvotes: 21

Mario Rojas
Mario Rojas

Reputation: 146

Hi you can try using Pandas for this

import pandas as pd


list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']

list2 = ['Dublin', 'Cork', 'Galway']

list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon']

list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston']


----------


a = pd.Series(list(list1))

b = pd.Series(list(list2))

c = pd.Series(list(list3))

d = pd.Series(list(list4))

lists = [a,b,c,d]


----------


for i in lists:

    if (i.isin(['London']).any()) and (i.isin(['Manchester']).any()) == True:
        print('Same Group')
    else:
        print('Different Group')

Results

Same Group

Different Group

Different Group

Different Group

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 149155

If you want it to be really fast, you should reverse your data structure to have a dictionary of sets where the key would be a town and the set would contain all towns in same group. That way you can make sure that in_same_group will only need:

  • one single dictionary research
  • on single set containment research

As those accesses are optimized for dictionaries and sets, the research should be as fast as it can be

Code could be:

import collections

h = collections.defaultdict(set)

lists = [list1, list2, list3, list4]
for l in lists:
    for town in l:
        for other in l:
            if town != other:
                h[town].add(other)

The function is now as simple as:

def in_same_group(t1, t2):
    return t2 in h[t1]

Upvotes: 1

chepner
chepner

Reputation: 532268

First, use sets, not lists (and use a list of sets instead of separate variables).

master_list = []
master_list.append(set(['London', 'Manchester', 'Liverpool', 'Edimburgh']))
master_list.append(set(['Dublin', 'Cork', 'Galway']))
master_list.append(set(['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon', ...]))
master_list.append(set(['Washington', 'New York', 'San Francisco', 'LA', 'Boston', ...]))

(Depending on your use case, a dict with more meaningful keys may be more appropriate than a list.)

Second, build a dict that maps each element to its set:

# E.g., index['London'] == set(['London', 'Manchester', ...])
index = dict((item, s) for s in master_list for item in s)

Now, you just need to check if both items belong to the same set.

def in_same_group(i1, i2):
    return index[i1] is index[i2]

Upvotes: 5

Horia Coman
Horia Coman

Reputation: 8781

One approach would be to build a map from the city to it's group number. So you'd build something like:

mapping = {
    'London': 1,
    ...,
    'Berlin': 3
    ...
}

Then your in_same_group function can be:

def in_same_group(item1, item2):
    gr1 = mapping[item1]
    gr2 = mapping[item2]
    return gr1 == gr2

In terms of speed, this is quite fast, as it's just two dictionary lookups, which are very fast in Python and one comparison, which is again, quite fast. In big-Oh the function is O(1).

But it assumes the element are only part of one group. Which seems to be the case in the example you provided.

You do have to spend the extra time and memory of actually building the map. But it's going to be amortized over all the calls to in_same_group. OTOH, you probably wouldn't get away with building an indexing structure regardless of your approach.


The code to build the mapping would be:

def build_mapping(groups):
    mapping = {}
    for i in range(0, len(groups)):
        for g in groups[i]:
            mapping[g] = i
    return mapping

It's not the prettiest code but it gets the job done.

Upvotes: 8

Ajax1234
Ajax1234

Reputation: 71471

You can iterate through the lists and determine if both search queries are found in it. Then, return the boolean value of the newly formed list.

def search(s1, s2):
   list1 = ['London', 'Manchester', 'Liverpool', 'Edimburgh']
   list2 = ['Dublin', 'Cork', 'Galway']
   list3 = ['Berlin', 'Munich', 'Frankfurt', 'Paris', 'Milan', 'Rome', 'Madrid', 'Barcelona', 'Lisbon']
   list4 = ['Washington', 'New York', 'San Francisco', 'LA', 'Boston']
   return bool([i for i in [list1, list2, list3, list4] if s1 in i and s2 in i])

Upvotes: 2

Related Questions