Maciek Wrona
Maciek Wrona

Reputation: 29

Find optimal set of substrings in given string

I'm trying to find optimal set of strings for given string.

Given string: "FEEJEEDAI"

Substrings values:

FE - 1
JE - 2
JEE - 3
AI - 4
DAI - 6

Possible combinations:

1) [FE-JE-DAI] - 1+2+6 = 9
2) [FE-JEE-DAI] - 1+3+6 = 10
3) [FE-JE-AI] - 1+3+4 = 8

OPTIMAL COMBINATION - 2) [FE-JEE-DAI] scores 10

I think it should go something like this:

1) check if string contain particular substring:

var string = "FEEJEEDAI", substring = "JE"; string.indexOf(substring) !== -1;

2) If true find it's index

var subStringIndex = string.indexOf(substring)

3) Create new tempString to build combination and 'cut off' substring from string

var tempString = string.slice(subStringIndex, substring.length)

4) Iterate through string and find optimal tempString

I don't know how to build it into loop and and handle situation JE vs JEE, AI vs DAI

Upvotes: 1

Views: 1148

Answers (2)

Nina Scholz
Nina Scholz

Reputation: 386726

Basically, you could use an iterative and recursive approach for getting all possible substrings of the string.

This solution is splitted into 3 parts

  1. Preparation
  2. Collecting parts
  3. Calculating score and create result set

Preparation

At start, all substrings of the string are collected in the indices object. The key is the index and the value is an object with a limit, which is the smallest length of the strings in the pattern array. The pattern array contains the index and the found substrings beginning at that index.

indices object from the first example

{
    0: {
        limit: 2,
        pattern: [
            {
                index: 0,
                string: "FE"
            }
        ]
    },
    3: {
        limit: 2,
        pattern: [
            {
                index: 3,
                string: "JE"
            },
            {
                index: 3,
                string: "JEE"
            }
        ]
    },
    /* ... */
}

Collecting parts

The main idea is to start at index zero with an empty array for collecting substrings.

To check, which parts are together in a group, you need to get the first substring at a given index or the next close one, then take the limit property, which is the length of the shortest substring, add the index and take it as the maximum index for searching group members.

From the second example the first group consists of 'FE', 'EE' and 'EEJ'

string      comment
----------  -------------------------------------
01 2345678  indices
FE|EJEEDAI  
FE|         matching pattern FE  at position 0
 E|E        matching pattern EE  at position 1
 E|EJ       matching pattern EEJ at position 1
^^          all starting substrings are in the same group

With that group, a new recursion is invoked, with an adjusted index and with the substring concatinated to the parts array.

Calculating score and create result set

If no more substring are found, the parts are joined and the score is calculated and pushed to the result set.

Interpreting the result

 [
    {
        parts: "0|FE|3|JE|6|DAI",
        score: 9
    },
    /* ... */
]

parts are a combination of indices and matching strings at the position

 0|FE|3|JE|6|DAI
 ^ ^^            at index 0 found FE
      ^ ^^       at index 3 found JE
           ^ ^^^ at index 6 found DAI

score is calculated with the given weights of the substrings

substring  weight
---------  ------
 FE            1
 JE            2
 DAI           6
---------  ------
score          9

The example three returns 11 unique combinations.

function getParts(string, weights) {

    function collectParts(index, parts) {
        var group, limit;
        while (index < string.length && !indices[index]) {
            index++;
        }
        if (indices[index]) {
            group = indices[index].pattern;
            limit = index + indices[index].limit;
            while (++index < limit) {
                if (indices[index]) {
                    group = group.concat(indices[index].pattern);
                }
            }
            group.forEach(function (o) {
                collectParts(o.index + o.string.length, parts.concat(o.index, o.string));
            });
            return;
        }
        result.push({
            parts: parts.join('|'),
            score: parts.reduce(function (score, part) { return score + (weights[part] || 0); }, 0)
        });
    }

    var indices = {},
        pattern,
        result = [];

    Object.keys(weights).forEach(function (k) {
        var p = string.indexOf(k);
        while (p !== -1) {
            pattern = { index: p, string: k };
            if (indices[p]) {
                indices[p].pattern.push(pattern);
                if (indices[p].limit > k.length) {
                    indices[p].limit = k.length;
                }
            } else {
                indices[p] = { limit: k.length, pattern: [pattern] };
            }
            p = string.indexOf(k, p + 1);
        }
    });
    collectParts(0, []);
    return result;
}

console.log(getParts("FEEJEEDAI", { FE: 1, JE: 2, JEE: 3, AI: 4, DAI: 6 }));
console.log(getParts("FEEJEEDAI", { FE: 1, JE: 2, JEE: 3, AI: 4, DAI: 6, EEJ: 5, EJE: 3, EE: 1 }));
console.log(getParts("EEEEEE", { EE: 2, EEE: 3 }));
.as-console-wrapper { max-height: 100% !important; top: 0; }

Upvotes: 8

JohnLRBrock
JohnLRBrock

Reputation: 1

If you're slicing out the substring when you find them, since certain substrings are substrings of other substrings, search for the biggest ones first. For example, if you didn't find DAI, and you find AI, it can't be a part of DAI. You want to test for each substring, so you can put each substring into an array and loop through the array.

Upvotes: 0

Related Questions