Reputation: 1
I have a multiple lists with data of the form:(There is a simple example, in fact, the dimension of row-vectors are much larger)
list 1: [num1]
[[1,0,0,1,0], [0,0,1,0,1], [0,1,0,1,0], ...]
list 2: [num2]
[[0,0,0,1,0], [1,0,0,1,0], [0,0,1,0,0], ...]
...
list n: [numn]
[[1,1,0,1,0], [1,0,0,1,1], [0,0,1,0,1], ...]
Every list marked with its own number [num]
(numbers are not repeated).
The main question is: How to efficently find all num's of lists with identical row-vectors from them and such vectors?
In details:
For example, the row-vector [1,0,0,1,0]
occurs in list 1 and list 2, so then I should return [1,0,0,1,0] : [num1], [num2]
First of all hash tables come to mind. I think it's best to use due to the large amount of data but I know hash tables quite superficially and I can’t structurize a clear algorithm in my head with this case. Can anyone advise what should I pay attention to and what modules should I consider? Perhaps there are other efficient approaches?
Upvotes: 0
Views: 90
Reputation: 11903
It is beyond the scope of a regular question to dive into hash tables and such. But suffice to say that sets
in Python are backed by hash tables and checking for set membership is almost instantaneous and much more efficient than searching through lists.
If order doesn't matter within your list of vectors, you should just think of them as unordered collections (sets). Sets need to contain immutable things, so you cannot put a list into a set, but you can put in tuples. So, if you re-structure your data to be sets of tuples, you are in good shape.
You have many "cases" of things you might do then, below are a few examples.
data = { 1: {(1, 0, 0), (1, 1, 0)},
2: {(0, 0, 0), (1, 0, 0)},
3: {(1, 0, 0), (1, 0, 1), (1, 1, 0)}}
# find common vectors in 2 sets
def common_vecs(a, b):
return a.intersection(b)
# find all the common vectors in a group of sets
def all_common_vecs(grps):
return set.intersection(*grps)
# find which sets contain a specific vector
def find(vec, data):
result = set()
for idx, grp in data.items():
if vec in grp:
result.add(idx)
return result
print(common_vecs(data[1], data[3]))
print(all_common_vecs(data.values()))
print(find((1,0,1), data))
{(1, 0, 0), (1, 1, 0)}
{(1, 0, 0)}
{3}
Upvotes: 1