marshall
marshall

Reputation: 2483

How to speed up exponential time code

I am writing a simulation. The problem I need to solve is as follows.

I am given a random binary vector t (a list in python) of length l. I then sample binary vectors of the same length and measure the Hamming distance (that is the number of aligned mismatches) between each of the sampled vectors and t and store the results. I want to determine how many binary vectors of length l are compatible with the distances found so far. Clearly t is but it is also likely many others are too.

My code is as follows.

import random
import itertools
import operator

l=23
t = [random.randint(0,1) for b in range(l)]
stringsleft = set(itertools.product([0,1],repeat=l))

xcoords = []
ycoords = []
iters = l
for i in xrange(iters):
    print i, len(stringsleft)
    xcoords.append(i)
    ycoords.append(math.log(len(stringsleft)))
    pattern = [random.randint(0,1) for b in range(l)]
    distance = sum(itertools.imap(operator.ne, pattern, t))
    stringsleft = stringsleft.intersection(set([y for y in stringsleft if sum(itertools.imap(operator.ne, pattern, y)) == distance]))

Unfortunately it is very slow and uses a lot of memory and does not work at all if I increase l to 30. Is it possible to speed this up to solve the l=30 case?

Update
I made a small optimisation by replacing the lists by integers so now it runs with l=26.

l=26
t = random.randint(0,2**l)
stringsleft = set(range(2**l))
xcoords = []
ycoords = []
iters = l
for i in xrange(iters):
    print i, len(stringsleft)
    if (len(stringsleft) > 1):
        xcoords.append(i)
        ycoords.append(math.log(len(stringsleft),2))
    pattern = random.randint(0,2**l)
    distance = bin(pattern ^ t).count('1')
    stringsleft = stringsleft.intersection(set([y for y in stringsleft if bin(pattern ^ y).count('1') == distance]))

The problem that stops me getting to l=30 is RAM usage rather than time.

Upvotes: 1

Views: 320

Answers (3)

RemcoGerlich
RemcoGerlich

Reputation: 31260

I kept thinking about this, and the program I wrote in the end is basically Sneftel's method. At first, no choices are made and we have the list of Hamming distances. Then say we choose 1 for the first value, then all the sequences that had a 0 there are only compatible if the rest of the sequence is compatible with the hamming distance - 1. Same for 1.

I had a recursive version at first as well, but changed it to a loop using a stack so that I could use yield to turn it into a generator that yields compatible sequences.

One optimization I do is that if there is a guess with Hamming distance > l/2, I flip the 0s and 1s so that the new guess has guaranteed Hamming distance <= l/2.

I get decent performance if l=30, though it depends on the number of guesses and how lucky they are.

Because of my earlier Mastermind thoughts, I still think in terms of 'guesses' that try to figure out the 'sequence'.

Code:

import random

LENGTH = 30
NUMGUESSES = 20


def guess():
    return [random.randint(0, 1) for i in range(LENGTH)]

SEQUENCE = guess()
GUESSES = [guess() for i in range(NUMGUESSES)]


def hamming(a, b):
    return sum(aelem != belem for aelem, belem in zip(a, b))


# Flip if hamming > LENGTH / 2
mid = LENGTH / 2
for i in range(NUMGUESSES):
    if hamming(SEQUENCE, GUESSES[i]) > mid:
        GUESSES[i] = [1 - g for g in GUESSES[i]]

HAMMINGS = [hamming(SEQUENCE, g) for g in GUESSES]

print "Sequence:", SEQUENCE
print
for h, g in zip(HAMMINGS, GUESSES):
    print "Guess and hamming:", g, h
print


def compatible_sequences(guesses, hammings):
    # Use a stack instead of recursion, so we can make this a generator.
    stack = []
    # Start: we have chosen nothing yet, and the hammings have start values
    stack.append(([], hammings))

    while stack:
        so_far, hammingsleft = stack.pop()

        if -1 in hammingsleft:
            # Skip, choices so far are incompatible with the guesses
            continue

        if len(so_far) == LENGTH:
            # Success if all the Hamming distances were exactly correct
            if all(h == 0 for h in hammingsleft):
                yield so_far
            continue

        # Not done yet, add choices 0 and 1 to the queue
        place_in_guess = len(so_far)
        for choice in (1, 0):
            stack.append(
                (so_far + [choice],
                 [h - (g[place_in_guess] != choice)
                  for h, g in zip(hammingsleft, guesses)]))

print "Compatible sequences:"
for nr, sequence in enumerate(compatible_sequences(GUESSES, HAMMINGS), 1):
    print nr, sequence

Upvotes: 2

Sneftel
Sneftel

Reputation: 41474

A backtracking algorithm could work here. The state would be a (partial) vector and the remaining Hamming distance for each example vector. Given a state, you try appending a 0, and decrease the Hamming distance for each vector that had a 1 in that spot. If any distances drop below zero, the solution is inadmissible; otherwise, you recurse. Then you do the same thing with 1. Once the vector is complete, you output it if all distances are zero.

This is still exponential time, but should be considerably faster.

Upvotes: 1

RemcoGerlich
RemcoGerlich

Reputation: 31260

Edit: I'm wrong. In Mastermind, you also have knowledge of the number of colors that were right, but not in the right spot. That changes the number of possible solutions, so that there is no obvious exact translation between the two problems and so I can't make the conclusion below. Leaving the answer here for now, maybe it helps someone think about the problem.

I answered:

Bad news, it's at least NP-complete.

Your problem reminded me of the game Mastermind (Wikipedia). On that page, they also mention the "Mastermind satisfiability problem: given a set of two-color Mastermind guesses and answers, is there a possible right guess?

So that problem has slightly more information than you: Mastermind gives number of correct colors in the right place (== length minus Hammond distance), plus the number of correct colors in the wrong place. And they only try to decide whether the number of possibilities is > 0. And that problem is NP-complete according to this paper.

Upvotes: 1

Related Questions