comparing a string[i] with another string [i] regaring duplicates

Question

so the program asks for two strings then. the program should return text1 where each letter of the secret has been put in brackets. So that if you read out only the letters of text1 that are in brackets you get the secret text.

Examples

enter text: qmweortnasdftzxcvyqwer
enter secret: monty
q[m]we[o]rt[n]asdf[t]zxcv[y]qwer

enter text: amaoa
enter secret: monty python
a[m]a[o]a

enter text: abcmasoaasnaataayas aspasyastashasoasnas
enter secret:monty
abc[m]as[o]aas[n]aa[t]aa[y]as aspas[y]astashasoasnas

code

text = input("enter text: ")
secret = input("enter secret text: ")
len_text = len(text)
len_secret = len(secret)
final_answer = ""

for i in range(len_text):

    for k in range(len_secret):
        if secret[k] == text[i] and secret[k] not in final_answer:
            final_answer += "[" + secret[k] + "]"

    if text [i] not in final_answer: 
        final_answer += text[i]


print(final_answer)

error in code

I've tried a couple different ways but this is the closest code I can come up with that gets me closer to the answer. all the others I have tried end up with more letters or repeats.

enter text: monty python
enter secret: monty python

 [m][o][n][t][y][ ][p][h]<<

abarnert · Accepted Answer

The root problem here is that you're thinking in terms of sets.

First, either a letter is a member of secret, or it isn't. So, each letter that's in secret is a member of secret no matter how many times it appears, so you end up with too many duplicates—hello and lo will give you he[l][l][o] instead of just he[l]l[o].

You attempt to fix that by using another set, the set of letters used so far, but that has the same problem: either a letter has been used so far, or it hasn't. So now you end up using each letter only once, even if it appears multiple times in secret, and you have no duplicates at all: hello and hello gives you [h][e][l]l[o] instead of [h][e][l][l][o].

The key is that secret is not a set, it's a multiset: letters don't just appear or not appear, they appear a certain number of times. You want to use each letter in secret as many times as it appears—not infinite times, not just once. The only way to do this is to keep track of how many times it appears, and how many times you've used it.

The smallest change that would fix this is to just remove each letter as you find it. Like this:

for letter in text:
    if letter in secret:
        secret = secret.remove(letter, 1) # use it up
        final_answer += "[" + letter + "]"
    else:
        final_answer += letter

But a much nicer solution is to actually store the secret letters as a multiset. That's exactly what Counter does:

from collections import Counter
secret = Counter(secret)
for letter in text:
    if secret[letter]:
        secret[letter] -= 1 # use it up
        final_answer += "[" + letter + "]"
    else:
        final_answer += letter

This doesn't look much different on the surface, but if you look at what's happening under the covers, it's a lot simpler.

When secret is a string, if letter in secret actually has to search every character of the string and compare it to letter. And then, secret.remove(letter, 1) has to search the string all over again to find the same letter, and then it has to copy the whole string minus that letter into a new string to return to you.

When secret is a Counter, if letter in secret just looks up letter in the hash table and checks the number there. And then, secret[letter] -= 1 just decrements the number. Instead of two linear searches and a linear copy, we're just doing a direct lookup and subtraction.

Of course the performance is unlikely to matter in this case. And Python wraps up all that linear searching and copying so it looks easy, even if what happens under the covers isn't. So, if you're having a hard time understanding how Counter works, or what a multiset is, don't feel too bad about sticking with the string, but make a note to come back later and see if you can see the differences once you've learned a bit more.

However, although this solves all of your examples correctly, I'm still not sure you're describing the problem correctly. You can only "read out the secret text" if all of the letters in secret happen to appear in the same order as in text. That happens to be true in all of your examples, but what's the right answer for, say, abcdeabcde and db? Should it be a[b]c[d]eabcde, or should it be abc[d]ea[b]cde?

If it's the latter, and the order of secret matters, then we don't want a set, or a multiset, we want a list. And we only want to check the first letter of it each time.

The smallest change, then, is:

for letter in text:
    if secret and secret[0] == letter:
        secret = secret[1:] # use it up
        final_answer += "[" + letter + "]"
    else:
        final_answer += letter

Notice that I'm checking secret before checking secret[0] == letter. This is because, after we've used up all the letters in secret, there is no more secret[0], so that would give us an IndexError.

There's no more double-linear-searching, of course, but we're still copying the string each time. A cleaner solution would be to use a list, where just popping the last value off is instantaneous:

secret = list(secret)[::-1]
for letter in text:
    if secret and secret[-1] == letter:
        secret.pop() # use it up
        final_answer += "[" + letter + "]"
    else:
        final_answer += letter

comparing a string[i] with another string [i] regaring duplicates

Answers (1)

Related Questions