Reputation: 585

How to permute binary string so to minimize distance between them?

I've got an array of weights, for example

[1, 0, 3, 5]

The distance between two strings is defined as a sum of weights for different bits, like this:

size_t distance(const std::string& str1, const std::string& str2, const std::vector<size_t>& weights) {
  size_t result = 0;
  for (size_t i = 0; i < str1.size(); ++i) {
    if (str1[i] != str2.at(i))
      result += weights.at(i);
  }
  return result;
}

and starting string for example

'1101'

I need to generate permutations in the way that strings with lowest distance from the original one go first, like this:

'1001'  # changed bits: 2nd. Because it has lowest weight. Distance is 0
'0101'  # changed bits: 1st.                               Distance is 1
'0001'  # changed bits: 1st, 2nd.                          Distance is 1
'1011'  # changed bits: 2nd, 3rd.                          Distance is 3
'1111'  # changed bits: 3rd.                               Distance is 3
'0111'  # changed bits: 1st, 3rd.                          Distance is 4
'0011'  # changed bits: 1st, 2nd, 3rd.                     Distance is 4
'1100'  # changed bits: 4th.                               Distance is 5
'1000'  # changed bits: 2nd, 4th.                          Distance is 5
'0100'  # changed bits: 1st, 4th.                          Distance is 6
'0000'  # changed bits: 1st, 2nd, 4th.                     Distance is 6
'1110'  # changed bits: 3rd, 4th.                          Distance is 8
'1010'  # changed bits: 2nd, 3rd, 4th.                     Distance is 8
'0110'  # changed bits: 1st, 3nd, 4th.                     Distance is 9
'0010'  # changed bits: 1st, 2nd, 3rd, 4th.                Distance is 9

I don't need code, I need just an algorithm which gets string of length N, array of weights of the same length and i as input and generates i-th permutation, without generating the whole list and sorting it.

Upvotes: 3

Answers (4)

Geezer

Reputation: 5710

In modern C++, the way to go about doing what you're asking is by using std::bitset to represent all possible bit multisets and then wrapping distance() with a comparer functor struct in order to call std::sort(). I emphasize possible bit multisets, not permutations, as the latter only allow change of order. Your code then will look something like this:

#include <string>
#include <array>
#include <cmath>
#include <bitset>
#include <vector>
#include <algorithm>
#include <iostream>

constexpr size_t BITSET_SIZE = 4;

size_t distance(const std::string& str1, const std::string& str2, const std::array<size_t, BITSET_SIZE>& weights) {
    size_t result = 0;
    for (size_t i = 0; i < str1.size(); ++i) {
        if (str1[i] != str2.at(i))
            result += weights.at(i);
    }
    return result;
}

struct of_lesser_distance
{
    const std::bitset<BITSET_SIZE>& originalBitSet;
    const std::array<size_t, BITSET_SIZE>& distanceVec;

    inline bool operator() (const std::bitset<BITSET_SIZE>& lhs, const std::bitset<BITSET_SIZE>& rhs)
    {
        return distance(originalBitSet.to_string(), lhs.to_string(), distanceVec) < distance(originalBitSet.to_string(), rhs.to_string(), distanceVec);
    }
};

int main()
{
    std::string s{"1101"};    
    std::array<size_t, 4> weights{1, 0, 3, 5};

    int possibleBitSetsCount = std::pow(2, s.length());

    std::vector<std::bitset<BITSET_SIZE>> bitSets;

    // Generates all possible bitsets
    for (auto i = 0; i < possibleBitSetsCount; i++)
        bitSets.emplace_back(i);

    // Sort them according to distance
    std::sort(bitSets.begin(), bitSets.end(), of_lesser_distance{ std::bitset<BITSET_SIZE>(s), weights });

    // Print
    for (const auto& bitset : bitSets)
        std::cout << bitset.to_string().substr(BITSET_SIZE - s.length(), s.length()) << " Distance: " << distance(s, bitset.to_string(), weights) << "\n";
}

Output:

1001 Distance: 0
1101 Distance: 0
0001 Distance: 1
0101 Distance: 1
1011 Distance: 3
1111 Distance: 3
0011 Distance: 4
0111 Distance: 4
1000 Distance: 5
1100 Distance: 5
0000 Distance: 6
0100 Distance: 6
1010 Distance: 8
1110 Distance: 8
0010 Distance: 9
0110 Distance: 9

Live version here.

Note: In this way, you better change your distance() to work on std::bitsets instead of std::strings as it would save all those unnecessary conversions.

I don't need code, I need just an algorithm

It's easier for me to give code but let me know if you wanted something else.

Upvotes: 1

James Poag

Reputation: 2380

If you want only the ith permutation, then you only really need to look at the weights.

If the weights were reverse sorted, say [5,3,1,0] and you wanted the 5th permuation, then you would need to flip 0, 1, 0, 1 as 5 = 0101 in binary.

So you need a very small mapping from the weight to it's original index. Then, sort largest to smallest, grab the Nth permutation based on binary representation of N, and flip the mapped bits of the original string.

Upvotes: 0

Soonts

Reputation: 21946

Sounds like hard problem.

If you are using size_t for permutation index, your strings will be limited to 32 or 64 characters, otherwise you’ll need larger integer for permutation index. Therefore, you can switch from strings to size_t bitmasks.

This way your algorithm no longer depends on the string, you find i-th bitmask, XOR it (^ operator in C++) with the input string bitmask, and you get your result. The hard part is finding i-th bitmask but this way, i.e. without using strings in the inner loops of the algorithm, the code will be much faster (an orders of magnitude).

Now the hard part is how to find the mask. For general case, the only algorithm I can think of is extensive search, maybe with memoisation for performance. This will be fast for small permutation indices but slow for large ones.

If you know your weights at compile-time, you can precompute indices into a search tree but it’s best done outside C++, it’s very hard to use template metaprogramming for complex algorithms like this one.

P.S. There’s one special case that might work for you. Sort weights, and check whether the following is true weights[N] == weights[N-1] || weights[N] >= sum( weights[0 .. N-1] for all 1<N<length, you only need a single loop over the sorted weights to check that. If it’s true for all weights, and also all weights are non-negative, the solution is trivially simple, and the performance will be very fast, just treat index as a XOR bitmask. The only thing you need to do, reorder bits in the index to match the order of the weights array that was changed as the result of sorting them. For your weights, switch first and second bits because the sorted order is [0,1,3,5].

BTW, the weights you have in the question satisfy that condition, because 1>=0, 3>=0+1 and 5>=0+1+3, so this simple algorithm will work OK for your particular weights.

Update: here's a complete solution. It prints slightly different result than your sample, e.g. in your example you have '1011' then '1111' , my code will print '1011' immediately after '1111' but their distance is the same, i.e. my algorithm still works OK.

#include <string>
#include <vector>
#include <algorithm>
#include <stdio.h>

struct WeightWithBit
{
    size_t weight, bit;
};

// Sort the weights while preserving the original order in the separate field
std::vector<WeightWithBit> sortWeights( const std::vector<size_t>& weights )
{
    std::vector<WeightWithBit> sorted;
    sorted.resize( weights.size() );
    for( size_t i = 0; i < weights.size(); i++ )
    {
        sorted[ i ].weight = weights[ i ];
        sorted[ i ].bit = ( (size_t)1 << i );
    }
    std::sort( sorted.begin(), sorted.end(), []( const WeightWithBit& a, const WeightWithBit& b ) { return a.weight < b.weight; } );
    return sorted;
}

// Check if the simple bit-based algorithm will work with these weights
bool willFastAlgorithmWork( const std::vector<WeightWithBit>& sorted )
{
    size_t prev = 0, sum = 0;
    for( const auto& wb : sorted )
    {
        const size_t w = wb.weight;
        if( w == prev || w >= sum )
        {
            prev = w;
            sum += w;
            continue;
        }
        return false;
    }
    return true;
}

size_t bitsFromString( const std::string& s )
{
    if( s.length() > sizeof( size_t ) * 8 )
        throw std::invalid_argument( "The string's too long, permutation index will overflow" );
    size_t result = 0;
    for( size_t i = 0; i < s.length(); i++ )
        if( s[ i ] != '0' )
            result |= ( (size_t)1 << i );
    return result;
}

std::string stringFromBits( size_t bits, size_t length )
{
    std::string result;
    result.reserve( length );
    for( size_t i = 0; i < length; i++, bits = bits >> 1 )
        result += ( bits & 1 ) ? '1' : '0';
    return result;
}

// Calculate the permitation. Index is 0-based, 0 will return the original string without any changes.
std::string permitation( const std::string& str, const std::vector<WeightWithBit>& weights, size_t index )
{
    // Reorder the bits to get the bitmask.
    // BTW, if this function is called many times for the same weights, it's a good idea to extract just the ".bit" fields and put it into a separate vector, memory locality will be slightly better.
    size_t reordered = 0;
    for( size_t i = 0; index; i++, index = index >> 1 )
        if( index & 1 )
            reordered |= weights[ i ].bit;

    // Convert string into bits
    const size_t input = bitsFromString( str );
    // Calculate the result by flipping the bits in the input according to the mask.
    const size_t result = input ^ reordered;
    // Convert result to string
    return stringFromBits( result, str.length() );
}

int main()
{
    const std::vector<size_t> weights = { 1, 0, 3, 5 };

    using namespace std::literals::string_literals;
    const std::string theString = "1101"s;

    if( weights.size() != theString.length() )
    {
        printf( "Size mismatch" );
        return 1;
    }

    if( weights.size() > sizeof( size_t ) * 8 )
    {
        printf( "The string is too long" );
        return 1;
    }

    // Sort weights and check are they suitable for the fast algorithm
    const std::vector<WeightWithBit> sorted = sortWeights( weights );
    if( !willFastAlgorithmWork( sorted ) )
    {
        printf( "The weights aren't suitable for the fast algorithm" );
        return 1;
    }

    // Print all permutations
    const size_t maxIndex = ( 1 << weights.size() ) - 1;
    for( size_t i = 0; true; i++ )
    {
        const std::string p = permitation( theString, sorted, i );
        printf( "%zu: %s\n", i, p.c_str() );
        if( i == maxIndex )
            break;  // Avoid endless loop when the string is exactly 32 or 64 characters.
    }
    return 0;
}

Upvotes: 1

Neowizard

Reputation: 3017

This problem cannot be efficiently solved. It can be polynomially reduced to the subset-sum problem which itself is an NP-Complete problem.

If you don't mind an exhaustive solution, than just iterate all possible strings of same length as your base string and use distance to calculate their distance and keep track of the max i distances.

Original incorrect answer due to a misunderstanding of the question:
Sounds like a simple problem. Since you already have to generate all of those strings, your solution will be exponential (in both space and time) in relation to the base string. You're basically unconstrained.

You can try something like^[1]:
1. Generate all possible strings of same length as base string. It's rather simple. Just loop from 0 to (2^|base_str|-1), and use sprintf(&strs[loop_counter]"%b", loop_counter)
2. sort strs using qsort and use distance as the comperator. Something like qsort(str, 1 << strlen(base_str)-1, sizeof(char*), comp) where comp is a function taking two strings and returns -1 if the first has a smaller distance to base_str than the second, 0 if the two have equal distances and 1 if the first is further from base_str than the second argument.

~~^[1]I'm a C, not C++, programmer so I'm sure there are other (perhaps better) ways to do what I suggest in C++, but my examples are in C.~~

Upvotes: 0

How to permute binary string so to minimize distance between them?

Answers (4)

Related Questions