torbinsky
torbinsky

Reputation: 1468

Generating Variants of a String According to Rules

What I'm hoping to get as an answer:

I'm hoping someone can tell me the general class(es) of algorithms which solve these types of problems. Ideally, the algorithms could be run in parallel, but I am still very interested in algorithms which can only be executed sequentially. Specific algorithm names or literature references would be much appreciated :)

What I've done so far:

I basically approached this using some brute force methods involving generating all permutations of a given string length and then filtering out ones that don't match the set of rules. I don't like this approach though, as it is obviously quite slow and quickly becomes limited by my computing power (1 PC) and the size of the string. I started coming up with little optimizations to this approach, but felt like I must be re-inventing the wheel and there is likely a set of algorithms out there already for solving these kinds of problems.

Problem description:

Consider a string like the following:

ABCDBEACBDECDBAACBE

I'm trying to figure out efficient methods for generating ALL permutations of a string which follow a set of rules/conditions.

Examples of Rules:


Simple example:

  1. There must be at least 2 differences from the original string
  2. The differences must be in the alphabet set (A,B,C,D,E)

An example of a single, valid permutation, given the above rules would be:

DE CDBEACBDECDBAACBE


Complicated rule set example 1:

  1. There must be at least 2 differences from the original string
  2. The differences must be in the alphabet set (A,B,C,D,E)
  3. NEW: Differences must occur within defined index ranges of the string
  4. NEW: Each range must 2 have differences

So, given the original string, let's say we have the following ranges defined:

[(0,4), (8,14)]

This would look like:

[ABCDB]EAC[BDECDBA]ACBE

And a valid permutation example is:

BD CDBEACB AB CDBAACBE


Complicated rule set example 2:

  1. There must be at least 2 differences from the original string
  2. The differences must be in the alphabet set (A,B,C,D,E)
  3. Differences must occur within defined index ranges of the string
  4. Each range must have 2 differences
  5. NEW: Differences must be a specific combination, let's say "AB"

Ranges:

[(0,4), (8,14)]

A valid permutation example is:

B AB DBEACB AB CDBAACBE


Complicated rule set example 3:

  1. There must be at least 2 differences from the original string
  2. The differences must be in the alphabet set (A,B,C,D,E)
  3. Differences must occur within defined index ranges of the string
  4. Each range must have 2 differences
  5. NEW: Ranges can overlap

Ranges:

[(0,4), (3,7)]

This would look like:

[ABCD[B]EAC]BDECDBAACBE

A valid permutation example is:

ABC ADB ACBDECDBAACBE


Upvotes: 1

Views: 189

Answers (1)

Peter de Rivaz
Peter de Rivaz

Reputation: 33509

Suggested approach

I would recommend you read up on a deterministic finite automaton.

The way I would attempt to model this kind of problem is to turn the problem into a state machine that is able to recognise strings of the type that you want based on feeding characters in one at a time.

Once you have a state machine, you can use dynamic programming to count the number of matching strings that can be generated by starting in a particular state. (The state incorporates knowing how many characters have been generated so far, and the progress in each of your rules.)

With the results of this dynamic programming, you can generate the k^th lexicographically ordered example of a matching string in time O(n) where n is the length of the string.

However, the cost of solving the dynamic programming will vary a lot depending on the complexity of your rule set.

Example

For your example 3:

There must be at least 2 differences from the original string
The differences must be in the alphabet set (A,B,C,D,E)
Differences must occur within defined index ranges of the string
Each range must have 2 differences

the state would need to include the following information:

Number of differences in first range (0,1,2)
Number of differences in second range (0,1,2)

so there would be a total of 9 states to model.

For a system with R ranges, each requiring D differences, there would be (D+1)^R states required (fewer if the ranges do not overlap).

Upvotes: 1

Related Questions