edrezen
edrezen

Reputation: 1823

Concatenation of prefixes of a boolean array

I have a boolean array A of size n that I want to transform in the following way : concatenate every prefix of size 1,2,..n.

For instance, for n=5, it will transform "abcde" into "aababcabcdabcde".

Of course, a simple approach can loop over every prefix and use elementary bit operations (shift, mask, plus); so the complexity of this approach is obviously O(n).

There are some well known tricks regarding bit manipulations like here.

My question is: is it possible to achieve a faster algorithm for the transformation described above with a complexity better than O(n) by using bit manipulations ?

I am aware that the interest of such an improvement may be just theoretical because the simple approach could be the fastest in practice, but I am still curious about the fact that there exists a theoretical improvement or not.

As a precision, I need to perform this transformation p times, where p can be much bigger than n; some pre-computations could be done for a given n and used later for the computation of the p transformations.

Upvotes: 0

Views: 126

Answers (2)

edrezen
edrezen

Reputation: 1823

I may have found another way to proceed.

The idea is to use multiplication to propagate the initial input I to the correct position. The multiplication coefficient J is the vector whose bits are set to one at position i*(i-1)/2 for i in [1:n].

However, a direct multiplication I by J will provide many unwanted terms, so the idea is to

  1. mask some bits from vectors I and J
  2. multiply these masked vectors
  3. remove some junk bits from the result.

We have thus several iterations to do; the final result is the sum of the intermediate results. We can write the result as "sum on i of ((I & Ai) * Bi) & Ci", so we have 2 masks and 1 multiplication per iteration (Ai, Bi and Ci are constants depending on n).

This algorithm seems to be O(log(n)), so it is better than O(log(n)^2) BUT it needs multiplication which may be costly. Note also that this algorithm requires registers of size n*(n+1)/2 which is better than n^2.

Here is an example for n=7

Input:
  I = abcdefg

We set J = 1101001000100001000001

We also note temporary results:
  Xi = I & Ai
  Yi = Xi * Bi
  Zi = Yi & Ci

iteration 1
----------------------------
1                              A1
11 1  1   1    1     1         B1
11 1  1   1    1     1         C1
----------------------------
a                              X1
aa a  a   a    a     a         Y1   
aa a  a   a    a     a         Z1


iteration 2
----------------------------
 11                            A2
 1 1  1   1    1     1         B2
  1 11 11  11   11    11       C2
----------------------------
 bc                            X2
  bcbc bc  bc   bc    bc       Y2
  b bc bc  bc   bc    bc       Z2


iteration 3
----------------------------
   1111                        A3
      1   1    1     1         B3
         1   11   111   1111   C3
----------------------------
   defg                        X3
         defgdefg defg  defg   Y3
         d   de   def   defg   Z3


FINAL SUM
----------------------------
aa a  a   a    a     a         Z1
  b bc bc  bc   bc    bc       Z2
         d   de   def   defg   Z3
----------------------------
aababcabcdabcdeabcdefabcdefg   SUM

Upvotes: 0

user555045
user555045

Reputation: 64913

I'm not sure if this is what you're looking for, but here is a different algorithm, which may be interesting depending on your assumptions.

First compute two masks that depend only on n, so for any particular n these are just constants:

  • C (copy mask), a mask that has every n'th bit set and is n² bits long. So for n = 5, C = 0000100001000010000100001. This will be used to create n copies of A concatenated together.
  • E (extract mask), a mask that indicates which bits to take from the big concatenation, which is built up from n times a block of n bits, with values 1, 3, 7, 15 ... eg for n = 5, E = 1111101111001110001100001. Pad the left with zeroes if necessary.

Then the real computation that takes an A and constructs the concatenation of prefixes is:

pext(A * C, E)

Where pext is compress_right, discarding the bits for which the extraction mask is 0 and compacting the remaining bits to the right.

The multiplication can be replaced by a "doubling" technique like this: (which can also be used to compute the C mask)

l = n
while l < n²:
    A = A | (A << l)
    l = l * 2

Which in general produces too many concatenated copies of A but you can just pretend the excess isn't there by not looking at it (the pext drops any excess input anyway). Efficiently producing E for an unknown and arbitrarily large n seems harder, but that's not really a use case for this approach anyway.

The actual time complexity depends on your assumptions, but of course in the full "arbitrary bit count" setting both multiplication and compaction are heavyweight operations and the fact that the output has a size quadratic in input size really doesn't help. For small enough n such that eg n² <= 64 (depending on word size), so everything fits in a machine word, it all works out well (even for unknown n, since all the required masks can be precomputed). On the other hand, for such small n it is also feasible to table the entire thing, doing a lookup for a pair (n, A).

Upvotes: 1

Related Questions