Reputation: 23
I have the following problem: Given 15 bit strings of 1024 bit each, what is the best way to find patterns in different strings on the same position? How the pattern looks like doesn't matter to me (length > 1 of course), I just want to find parts (as long as possible) where at least two of the strings match. An example:
Here i would like to get 001 from the first two strings (position 2 to 4, frequency 2 would also be great...) and 00 from the second and third string (position 1 to 2).
I hope, the problem is clear now... Anybody an idea? Thanks!
Upvotes: 2
Views: 175
Reputation: 5421
I would try to solve your problem in a two-step approach. Since you only have 15 bitstrings, it should be sufficient to compare all pairs of bitstrings (105 comparisons á 16 QWORDS should be doable)
For now, I'll also consider patterns of length 1 valid, we'll see how to get rid of this later. Since you don't mention a programming language, I'll try to keep it general and use C syntax for bit operations. Let a
and b
be two bitstrings.
i
th bit is part of a pattern if and only if a[i] == b[i]
. We can mark these bits efficiently by computing the bitwise xor (which corresponds to !=
) and negating: patterns = ~(a ^ b)
a
and b
are consecutive 1s in pattern
. Finding such long patterns can e.g. be solved by repeated shifting and AND operations, see this question for details. If you expect really long sequences, I would try to use some dedicated instructions to find non-pattern bits, I'll extend the answer if necessary.If this is really high-throughput code, you could try doing multiple comparisons simultaneously, i.e. using SIMD instructions, but that's a topic for another question.
Upvotes: 1