String decode: looking for a better approach

Question

I have worked out a O(n square) solution to the problem. I was wondering about a better solution to this. (this is not a homework/interview problem but something I do out of my own interest, hence sharing here):

If a=1, b=2, c=3,….z=26. Given a string, find all possible codes that string
can generate. example: "1123" shall give:
aabc //a = 1, a = 1, b = 2, c = 3
kbc // since k is 11, b = 2, c= 3
alc // a = 1, l = 12, c = 3
aaw // a= 1, a =1, w= 23
kw // k = 11, w = 23

Here is my code to the problem:

void alpha(int* a, int sz, vector>& strings) {
    for (int i = sz - 1; i >= 0; i--) {
        if (i == sz - 1) {
            vector t;
            t.push_back(a[i]);
            strings.push_back(t);
        } else {
            int k = strings.size();

            for (int j = 0; j < k; j++) {
                vector t = strings[j];
                strings[j].insert(strings[j].begin(), a[i]);

                if (t[0] < 10) {
                   int n = a[i] * 10 + t[0];

                    if (n <= 26) {
                        t[0] = n;
                        strings.push_back(t);
                    }
                }
            }
        }
    }
}

Essentially the vector strings will hold the sets of numbers. This would run in n square. I am trying my head around at least an nlogn solution.

Intuitively tree should help here, but not getting anywhere post that.

grek40 · Accepted Answer

Generally, your problem complexity is more like 2^n, not n^2, since your k can increase with every iteration.

This is an alternative recursive solution (note: recursion is bad for very long codes). I didn't focus on optimization, since I'm not up to date with C++X, but I think the recursive solution could be optimized with some moves.

Recursion also makes the complexity a bit more obvious compared to the iterative solution.

// Add the front element to each trailing code sequence. Create a new sequence if none exists
void update_helper(int front, std::vector>& intermediate)
{
    if (intermediate.empty())
    {
        intermediate.push_back(std::deque());
    }
    for (size_t i = 0; i < intermediate.size(); i++)
    {
        intermediate[i].push_front(front);
    }
}

std::vector> decode(int digits[], int count)
{
    if (count <= 0)
    {
        return std::vector>();
    }

    std::vector> result1 = decode(digits + 1, count - 1);
    update_helper(*digits, result1);

    if (count > 1 && (digits[0] * 10 + digits[1]) <= 26)
    {
        std::vector> result2 = decode(digits + 2, count - 2);

        update_helper(digits[0] * 10 + digits[1], result2);

        result1.insert(result1.end(), result2.begin(), result2.end());
    }

    return result1;
}

Call:

std::vector> strings = decode(codes, size);

Edit:

Regarding the complexity of the original code, I'll try to show what would happen in the worst case scenario, where the code sequence consists only of 1 and 2 values.

void alpha(int* a, int sz, vector>& strings)
{
    for (int i = sz - 1;
        i >= 0;
        i--)
    {
        if (i == sz - 1)
        {
            vector t;
            t.push_back(a[i]);
            strings.push_back(t); // strings.size+1
        } // if summary: O(1), ignoring capacity change, strings.size+1
        else
        {
            int k = strings.size();

            for (int j = 0; j < k; j++)
            {
                vector t = strings[j]; // O(strings[j].size) vector copy operation

                strings[j].insert(strings[j].begin(), a[i]); // strings[j].size+1
                // note: strings[j].insert treated as O(1) because other containers could do better than vector

                if (t[0] < 10)
                {
                    int n = a[i] * 10 + t[0];

                    if (n <= 26)
                    {
                        t[0] = n;
                        strings.push_back(t); // strings.size+1
                        // O(1), ignoring capacity change and copy operation

                    } // if summary: O(1), strings.size+1

                } // if summary: O(1), ignoring capacity change, strings.size+1

            } // for summary: O(k * strings[j].size), strings.size+k, strings[j].size+1

        } // else summary: O(k * strings[j].size), strings.size+k, strings[j].size+1

    } // for summary: O(sum[i from 1 to sz] of (k * strings[j].size))
    // k (same as string.size) doubles each iteration => k ends near 2^sz
    // string[j].size increases by 1 each iteration
    // k * strings[j].size increases by ?? each iteration (its getting huge)
}

Maybe I made a mistake somewhere and if we want to play nice we can treat a vector copy as O(1) instead of O(n) in order to reduce complexity, but the hard fact remains, that the worst case is doubling outer vector size in each iteration (at least every 2nd iteration, considering the exact structure of the if conditions) of the inner loop and the inner loop depends on that growing vector size, which makes the whole story at least O(2^n).

Edit2:

I figured out the result complexity (the best hypothetical algoritm still needs to create every element of the result, so result complexity is like a lower bound to what any algorithm can archieve)

Its actually following the Fibonacci numbers:

For worst case input (like only 1s) of size N+2 you have:

size N has k(N) elements
size N+1 has k(N+1) elements
size N+2 is the combination of codes starting with a followed by the combinations from size N+1 (a takes one element of the source) and the codes starting with k, followed by the combinations from size N (k takes two elements of the source)
size N+2 has k(N) + k(N+1) elements

Starting with size 1 => 1 (a) and size 2 => 2 (aa or k)

Result: still exponential growth ;)

Edit3:

Worked out a dynamic programming solution, somewhat similar to your approach with reverse iteration over the code array and kindof optimized in its vector usage, based on the properties explained in Edit2.

The inner loop (update_helper) is still dominated by the count of results (worst case Fibonacci) and a few outer loop iterations will have a decent count of sub-results, but at least the sub-results are reduced to a pointer to some intermediate node, so copying should be pretty efficient. As a little bonus, I switched the result from numbers to characters.

Another edit: updated code with range 0 - 25 as 'a' - 'z', fixed some errors that led to wrong results.

struct const_node
{
    const_node(char content, const_node* next)
        : next(next), content(content)
    {
    }

    const_node* const next;
    const char content;
};

// put front in front of each existing sub-result
void update_helper(int front, std::vector& intermediate)
{
    for (size_t i = 0; i < intermediate.size(); i++)
    {
        intermediate[i] = new const_node(front + 'a', intermediate[i]);
    }
    if (intermediate.empty())
    {
        intermediate.push_back(new const_node(front + 'a', NULL));
    }
}

std::vector decode_it(int digits[9], size_t count)
{
    int current = 0;
    std::vector intermediates[3];
    for (size_t i = 0; i < count; i++)
    {
        current = (current + 1) % 3;
        int prev = (current + 2) % 3; // -1
        int prevprev = (current + 1) % 3; // -2

        size_t index = count - i - 1; // invert direction

        // copy from prev
        intermediates[current] = intermediates[prev];
        // update current (part 1)
        update_helper(digits[index], intermediates[current]);

        if (index + 1 < count && digits[index] &&
            digits[index] * 10 + digits[index + 1] < 26)
        {
            // update prevprev
            update_helper(digits[index] * 10 + digits[index + 1], intermediates[prevprev]);
            // add to current (part 2)
            intermediates[current].insert(intermediates[current].end(), intermediates[prevprev].begin(), intermediates[prevprev].end());
        }
    }
    return intermediates[current];
}

void cleanupDelete(std::vector& nodes);

int main()
{
    int code[] = { 1, 2, 3, 1, 2, 3, 1, 2, 3 };
    int size = sizeof(code) / sizeof(int);
    std::vector result = decode_it(code, size);

    // output
    for (size_t i = 0; i < result.size(); i++)
    {
        std::cout.width(3);
        std::cout.flags(std::ios::right);
        std::cout << i << ": ";
        const_node* item = result[i];
        while (item)
        {
            std::cout << item->content;
            item = item->next;
        }
        std::cout << std::endl;
    }

    cleanupDelete(result);
}


void fillCleanup(const_node* n, std::set& all_nodes)
{
    if (n)
    {
        all_nodes.insert(n);
        fillCleanup(n->next, all_nodes);
    }
}

void cleanupDelete(std::vector& nodes)
{
    // this is like multiple inverse trees, hard to delete correctly, since multiple next pointers refer to the same target
    std::set all_nodes;
    for each (auto var in nodes)
    {
        fillCleanup(var, all_nodes);
    }
    nodes.clear();
    for each (auto var in all_nodes)
    {
        delete var;
    }
    all_nodes.clear();
}

A drawback of the dynamically reused structure is the cleanup, since you wanna be careful to delete each node only once.

String decode: looking for a better approach

Answers (1)

Related Questions