Sean Saito
Sean Saito

Reputation: 675

Generating n binary vectors where each vector has a Hamming distance of d from every other vector

I'm trying to generate n binary vectors of some arbitrary length l, where each vector i has a Hamming distance of d (where d is even) from every other vector j. I'm not sure if there are any theoretical relationships between n, l, and d, but I'm wondering if there are any implementations for this task. My current implementation is shown below. Sometimes I am successful, other times the code hangs, which indicates either a) it's not possible to find n such vectors given l and d, or b) the search takes a long time especially for large values of l.

My questions are:

Example:

codebook = generate_codebook(4,3,2)
codebook

1 1111
2 1001
3 0101

Upvotes: 6

Views: 727

Answers (2)

javidcf
javidcf

Reputation: 59711

Not a real solution, but more of a partial discussion about the relationship between l, d and n and the process of generating vectors. In any case, you may consider posting the question (or a similar one, in more formal terms) to Mathematics Stack Exchange. I have been reasoning as I was writing, but I hope I didn't make a mistake.

Let's say we have l = 6. Since the Hamming distance depends only on position-wise differences, you can decide to start by putting one first arbitrary vector in your set (if there are solutions, some may not include it, but at least one should). So let's begin with an initial v1 = 000000. Now, if d = 1 then obviously n can only be 1 or 2 (with 111111). If d = 1, you will find that n can also only be 1 or 2; for example, you could add 000001, but any other possible vector will have a distance of 2 or more with at least one the vectors you have.

Let's say d = 4. You need to change 4 positions and keep the other 2, so you have 4-combinations from a 6-element set, which is 15 choices, 001111, 010111, etc. - you can see now that the binomial coefficient C(n, d) plus 1 is an upper bound for n. Let's pick v2 = 001111, and say that the kept positions are T = [1, 2] and the changed ones are S = [3, 4, 5, 6]. Now to go on, we could consider making changes to v2; however, in order to keep the right distances we must follow these rules:

  • We must make 4 changes to v2.
  • If we change a position in a position in S, we must make another change in a position in T (and viceversa). Otherwise, the distance to v1 would not be kept.

Logically, if d were odd you would be done now (only sets of two elements could be formed), but fortunately you already said that your distance numbers are even. So we divide our number by two, which is 2, and need to pick 2 elements from S, C(4, 2) = 6, and 2 elements from T, C(2, 2) = 1, giving us 6 * 1 = 6 options - you should note now that C(d, d/2) * C(l - d, d/2) + 2 is a new, lower upper bound for n, if d is even. Let's pick v3 = 111100. v3 has now four kinds of positions: positions that have changed with respect to both v1 and v2, P1 = [1, 2], positions that have not changed with respect to either v1 or v2, P2 = [] (none in this case), positions that have changed with respect to v1 but not with respect to v2, P3 = [3, 4], and positions that have changed with respect to v2 but not with respect to v1, P4 = [5, 6]. Same deal, we need 4 changes, but now each change we make to a P1 position must imply a change in a P2 position, and each change we make to a P3 position must imply a change in a P4 position. The only remaining option is v4 = 110011, and that would be it, the maximum n would be 4.

So, thinking about the problem from a combinatoric point of view, after each change you will have an exponentially increasing number of "types of positions" (2 after the first change, 4 after the second, 8, 16...) defined in terms of whether they are equal or not in each of the previously added vectors, and these can be arranged in couples through a "symmetry" or "complement" relationship. On each step, you can (I think, and this is the part of this reasoning that I am less sure about) greedily choose a set of changes from these couples and compute the sizes of the "types of positions" for the next step. If this is all correct, you should be able to write an algorithm based on this to generate and/or count the possible sets of vectors for some particular l and d and n if given.

Upvotes: 2

DAle
DAle

Reputation: 9117

Let's build a graph G where every L-bit binary vector v is a vertex. And there is an edge (vi, vj) only when a Hamming distance between vi and vj is equal to d. Now we need to find a clique of size n is this graph.

Clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent.

The task of finding a clique of given size in an arbitrary graph is NP-complete. You can read about this problem and some algorithms in this wikipedia article.

There are many special cases of this problem. For example, for perfect graphs there is a polynomial algorithm. Don't know if it is possible to show that our graph is one of these special cases.

Upvotes: 2

Related Questions