Reputation: 81
Say I have two large lists, list_of_A_objects which contains objects of class A, and list _of_B_objects which contains objects of class B.
They both have string members.
I want to be able to search through all the elements in the two lists and if the string member of an A object is a sub-string of the string member of a B object I want it to do something.
What I've got below is fine if the lists are quite small, but if the lists are large it can take a long time.
Is there a way to make this faster. I've been thinking about using dictionaries in some way because they have fast lookups but I cant figure it out.
This is what I have so far.
class A:
def __init__(self, x):
self.string = x
class B:
def __init__(self,x):
self.string = x
list_of_A_objects = get_large_list_of_A_objects()
list_of_B_objects = get_large_list_of_B_objects()
for A_object in list_of_A_objects:
for B_Object in list_of_B_objects:
if A_object.string in B_Object.string:
do_something()
Upvotes: 4
Views: 295
Reputation: 32392
Here's a python implementation using a dictionary. First convert one of the lists into a indexed by its object's strings
a_map = {}
for A_object in list_of_A_objects:
a_map[A_object.string] = A_object
Then for each object in the other list, check whether the object's string exists in the dictionary (in constant time), and if so do_something
for B_object in list_of_B_objects:
if B_object.string in a_map:
do_something(a_map[B_object.string])
This assumes that each A_object has a unique string. If that's not the case then you can make the values of a_map an array of objects instead of a single object.
Upvotes: 0
Reputation: 133975
One thing you can do is create a single string from the B objects. While building that, you also create a list of indexes, so you know the index of each string in the larger string. See the code below.
Note that I'm not a python programmer, so you'll have to interpret my pseudo-code.
BStrings = ""
list_of_Indexes = new list of int
for B_object in list_of_B_objects
list_of_Indexes.Add(length of BStrings)
BStrings = BStrings + B_Object.string + newline
Now, you can search the BStrings string for each A_object. If the string is found, the function returns the index of where it was found in the string. You can then binary search the list_of_indexes to determine which B_object contains that string.
This doesn't really change the complexity of the operation (it's still MxN, where M is the number of objects in the A list, and N is the length of the B list), but searching a single string for substrings will be faster than looping over the B list because it avoids the overhead of setting up the search.
If even that is too slow, then you'll want to use something like the Aho-Corasick string matching algorithm. There's probably a decent Python implementation available.
Upvotes: 2