Bugaboo
Bugaboo

Reputation: 971

In-place run length decoding?

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place. The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].

This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.

Also, one cannot assume that the decoded string is always larger than the encoded string. Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?

And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555

Upvotes: 9

Views: 7464

Answers (4)

Aaron McDaid
Aaron McDaid

Reputation: 27133

If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.

It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.

A3B1C2D1E1

becomes

A3BC2DE

Here is some code, in C++, to remove the 1s from the string (O(n) complexity).

// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
    assert(j <= i); // optional check
    if(str[i] != '1') {
        str[j] = str[i];
        ++ j;
    }
    ++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string

Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.

(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).

Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.

During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.

Upvotes: 7

Thomas Eding
Thomas Eding

Reputation: 1

The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.


High level overview:

  • Determine the encoded length.
  • Determine the decoded length by reading all the numbers and summing them up.
  • End of buffer is MAX(decoded length, encoded length).
  • Decode the string by starting from the end of the string. Write from the end of the buffer.
  • Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.

int isDigit (char c) {
    return '0' <= c && c <= '9';
}

unsigned int toDigit (char c) {
    return c - '0';
}

unsigned int intLen (char * str) {
    unsigned int n = 0;
    while (isDigit(*str++)) {
        ++n;
    }
    return n;
}

unsigned int forwardParseInt (char ** pStr) {
    unsigned int n = 0;
    char * pChar = *pStr;
    while (isDigit(*pChar)) {
        n = 10 * n + toDigit(*pChar);
        ++pChar;
    }
    *pStr = pChar;
    return n;
}

unsigned int backwardParseInt (char ** pStr, char * beginStr) {
    unsigned int len, n;
    char * pChar = *pStr;
    while (pChar != beginStr && isDigit(*pChar)) {
        --pChar;
    }
    ++pChar;
    len = intLen(pChar);
    n = forwardParseInt(&pChar);
    *pStr = pChar - 1 - len;
    return n;
}

unsigned int encodedSize (char * encoded) {
    int encodedLen = 0;
    while (*encoded++ != '\0') {
        ++encodedLen;
    }
    return encodedLen;
}

unsigned int decodedSize (char * encoded) {
    int decodedLen = 0;
    while (*encoded++ != '\0') {
        decodedLen += forwardParseInt(&encoded);
    }
    return decodedLen;
}

void shift (char * str, int n) {
    do {
        str[n] = *str;
    } while (*str++ != '\0');
}

unsigned int max (unsigned int x, unsigned int y) {
    return x > y ? x : y;
}

void decode (char * encodedBegin) {
    int shiftAmount;
    unsigned int eSize = encodedSize(encodedBegin);
    unsigned int dSize = decodedSize(encodedBegin);
    int writeOverflowed = 0;
    char * read = encodedBegin + eSize - 1;
    char * write = encodedBegin + max(eSize, dSize);
    *write-- = '\0';
    while (read != encodedBegin) {
        unsigned int i;
        unsigned int n = backwardParseInt(&read, encodedBegin);
        char c = *read;
        for (i = 0; i < n; ++i) {
            *write = c;
            if (write != encodedBegin) {
                write--;
            }
            else {
                writeOverflowed = 1;
            }
        }
        if (read != encodedBegin) {
            read--;
        }
    }
    if (!writeOverflowed) {
        write++;
    }
    shiftAmount = encodedBegin - write;
    if (write != encodedBegin) {
        shift(write, shiftAmount);
    }
    return;
}

int main (int argc, char ** argv) {
    //char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    //char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    char * str = buff + 3;
    //char buff[256] = { "A1B1" };
    //char * str = buff;
    decode(str);
    return 0;
}

Upvotes: 2

Shayan Pooya
Shayan Pooya

Reputation: 1079

Another O(n^2) solution follows.

Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.

while ( there is an expandable element ):
    expand that element
    adjust (shift) all of the elements on the right side of the expanded element

Where:

  • Free space size is the number of empty elements left in the array.

  • An expandable element is an element that:

    expanded size - encoded size <= free space size
    

The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least one element that can be expanded (easy to prove).

Upvotes: 0

Matt Lacey
Matt Lacey

Reputation: 8255

This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?

For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).

Then loop though, moving the characters to their new slots:

To start: A|3|B|1|C|2||||||| Have a variable called end storing the index 5, i.e. the last, non-blank, entry.

You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).

Pseudocode for the move:

var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.

for(i = end; i > cursor; i++) { array[i + n] = array[i]; }

This would leave you with:

A|3|A|3|B|1|C|2|||||

Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:

for(i = cursor; i < cursor + n + 1; i++)
{
  array[i] = array[cursor - 1];
}

// increment the cursor afterwards!
cursor += n + 1;

Giving:

A|A|A|A|B|1|C|2|||||

Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.

Upvotes: 0

Related Questions