Possibilities of removing two characters from string to get unique ones

Question

I need to obtain a number of unique strings after removing exactly two characters. An algorithm should also give a possibility to print these new words. The input string is made from 'a' to 'z' ASCII characters only.

For example for an input string "pizza" we have 7 strings: "zza", "iza", "izz", "pza", "pzz", "pia", "piz".

My priority is high efficiency of an algorithm.

I abandoned idea with set containers to avoid duplicates due to a huge memory complexity (for 2400-character input string I filled whole RAM and swap). This solution also seems not to be so fast.

So I decided to design smarter algorithm which is similar to RLE compression, so I showed "pizza" as blocks: 1xP, 1xI, 2xZ, 1xA. Now we can see that for the first removing character it only matters from which block I removed character (it does not matter which 'z' we removed). So in the first stage we are removing one character from for example first block. In the second stage we are removing second character from still existing blocks. (We repeat procedure in a loop for an every case).

The problem is this solution will work good for removing one character. But for two characters it may generate dupicates. For example input "aba" we have blocks 1xA, 1xB, 1xA: we are removing one character from first block and we get 1xB, 1xA. Next we are removing next character from first existing block (1xB) and we get 1xA. Next algorithm goes into next block of original input string (1xB) and removes one character belonging to the block and we get 2xA. Next we are removing second character from existing blocks (we have got only one block), and we get 1xA. Here we have got duplicate.

In general the algorithm will fail with such words "abababababa..." (we will get same case when we remove first "ab" or any other neighboring.

I need suggestions how to neatly deal with this duplication. Maybe you got completely different ideas for an algorithm.

גלעד ברקן · Accepted Answer

I think there is an O(n) time, O(1) space solution for calculating the number (O(n^3) worst case of course for the actual strings). For any one character removed, consider that a string would equal another only when the character removed is from the same contiguous block of identical characters. For example, "piz_a" and "pi_za".

First, we'll count non-adjacent removals. Calculate a tally in O(n) time representing how many blocks of contiguous identical characters are to one side, say left, of the current character, excluding the block the character is in. For example, in "pizza", we would have [0,1,2,2,3]. As we traverse, add the number in array[i-1] to the total, but only if the current char is different from the previous. For "pizza", we would have `(skip) + 0 + 1 + (skip) + 2 = 3 so far. (I used an array for demonstration, although we only need to keep the previous element.)

Now traverse a single window of two adjacent characters, adding 1 only when the two characters do not equal the previous two and the second character in the pair does not equal the character before the pair (like "ba" in "aba"). For "pizza", we get (skip) + 1 + 1 + 1 + 1 = 4, which together with the count for non-adjacent removals sums to 7.

For "ababab", the pair traversal would only count the first "ab" removal, while the non-adjacent pair traversal would count: "_b_bab", "_ba_ab", "a_a_ab", "_bab_b", "a_ab_b", "ab_b_b", "_baba_", "a_aba_", "ab_ba_", and "aba_a_".

To print the actual strings, follow the same idea of enumerating adjacent-pair removals separately from non-adjacent, using the principles above.

Possibilities of removing two characters from string to get unique ones

Answers (1)

Related Questions