Reputation: 12107
I have an array of 64 bits numbers where each bit represent an index in an array we will call A. The data in the rows is STATIC and preprocessing time is not important (within reason), the data in the array A varies. There are about 32 million rows to process, so I am trying to find how to get the fastest execution time.
The process is simple; For example, if I have the following numbers:
Row[0] = ...001110, Result[0] = A[1] | A[2] | A[3]
Row[1] = ...001101, Result[1] = A[0] | A[2] | A[3]
The method I thought about would find the largest bit patterns in the list and store temporary results:
For example, we see the 001100 pattern repeated, so we could save some steps:
...001110 => calculate A[1], A[2], A[3], store B[0] = (A[2] | A[3]), result = A[1] | B[0]
...001101 => calculate A[0], result = A[0] | B[0]
...110001 etc...
Ideally, I could make a tree that tells me how to build every possible combination in use (maybe treating the rows by chunk if it gets too large).
Another example:
...0011010
...1101010
...1100010
...1000110
I can calculate 1100000 and 0001010 one time and reuse the results 2 times, but 0100010 can be reused 3 times.
This is to speed up the multiplication of boolean matrices; I am assuming there are algorithms that already exist.
Upvotes: 1
Views: 100
Reputation: 28839
This solution seems a bit too obvious, so I may be missing something, but ...
If each element in A is identified by a bit in Row, and Row is of type uint64, then it follows that A can have at most 64 elements.
If elements of A are in turn Boolean values, as your examples suggest, why not simply have A as a uint64 as well, rather than an array of bool?
Then you could simply do
Result[i] = (Row[i] & A) != 0;
which would be really hard to beat performance-wise.
Upvotes: 0
Reputation: 343
If there are enough repetitions (from your question I assume there are) so that the caching would speed up things, one way to do this would be use a hash to cache the patterns that have already been calculated. Something like this (pseudocode):
for(Row)
{
Pattern := Patterns[Row];
if(Exists(Hash(Pattern)))
{
Row_result := Hash(Pattern);
}
else
{
Row_result := Calculate(Pattern);
Hash(Pattern) := Row_result;
}
}
Now back to real life, you've said that the patterns are 64 bit long. If there are a lot of different patterns, the hash memory requirements would be huge. To mitigate this I suggest splitting each pattern into two halves and use two hashes, 1ts one to hash upper half, 2nd to hash lower part of the pattern. Then Row_result := Hash1(Upper_half) | Hash2(Lower_part)
Having done this you'll lower memory consumption to a manageable several gigabytes in worst case. You can go further and make 4 hashes to make it even lower.
Upvotes: 1