Reputation: 874
The context and the problem below are only examples that can help to visualize the question.
Context: Let's say that I'm continously generating random binary vectors G with length 1x64 (whose values are either 0 or 1).
Problem: I don't want to check vectors that I've already checked, so I want to create a kind of table that can identify what vectors are already generated before.
So, how can I identify each vector in an optimized way?
My first idea was to convert the binary vectors into decimal numbers. Due to the maximum length of the vectors, I would need 2^64 = 1.8447e+19
numbers to encode them. That's huge, so I need an alternative.
I thought about using hexadecimal coding. In that case, if I'm not wrong, I would need nchoosek(16+16-1,16) = 300540195
elements, which is also huge.
So, there are better alternatives? For example, a kind of hash function that can identify that vectors without repeating values?
Upvotes: 2
Views: 280
Reputation: 10865
So you have 64 bit values (or vectors) and you need a data structure in order to efficiently check if a new value is already existing?
Hash sets or binary trees come to mind, depending on if ordering is important or not.
Matlab has a hash table in containers.Map.
Here is a example:
tic;
n = 1e5; % number of random elements
keys = uint64(rand(n, 1) * 2^64); % random uint64
% check and add key if not already existing (using a containers.Map)
map = containers.Map('KeyType', 'uint64', 'ValueType', 'logical');
for i = 1 : n
key = keys(i);
if ~isKey(map, key)
map(key) = true;
end
end
toc;
However, depending on why you really need that and when you really need to check, the Matlab function unique
might also be something for you.
Just throwing out duplicates once at the end like:
tic;
unique_keys = unique(keys);
toc;
is in this example 300 times faster than checking every time.
Upvotes: 1