asdfjklö1234
asdfjklö1234

Reputation: 23

Pattern search over various bitstrings

I have the following problem: Given 15 bit strings of 1024 bit each, what is the best way to find patterns in different strings on the same position? How the pattern looks like doesn't matter to me (length > 1 of course), I just want to find parts (as long as possible) where at least two of the strings match. An example:

Here i would like to get 001 from the first two strings (position 2 to 4, frequency 2 would also be great...) and 00 from the second and third string (position 1 to 2).

I hope, the problem is clear now... Anybody an idea? Thanks!

Upvotes: 2

Views: 175

Answers (1)

Tobias Ribizel
Tobias Ribizel

Reputation: 5421

I would try to solve your problem in a two-step approach. Since you only have 15 bitstrings, it should be sufficient to compare all pairs of bitstrings (105 comparisons á 16 QWORDS should be doable)

For now, I'll also consider patterns of length 1 valid, we'll see how to get rid of this later. Since you don't mention a programming language, I'll try to keep it general and use C syntax for bit operations. Let a and b be two bitstrings.

  1. Identify the possible patterns: The ith bit is part of a pattern if and only if a[i] == b[i]. We can mark these bits efficiently by computing the bitwise xor (which corresponds to !=) and negating: patterns = ~(a ^ b)
  2. Identify longest pattern: After the transformation above has been computed, common patterns between a and b are consecutive 1s in pattern. Finding such long patterns can e.g. be solved by repeated shifting and AND operations, see this question for details. If you expect really long sequences, I would try to use some dedicated instructions to find non-pattern bits, I'll extend the answer if necessary.

If this is really high-throughput code, you could try doing multiple comparisons simultaneously, i.e. using SIMD instructions, but that's a topic for another question.

Upvotes: 1

Related Questions