paddu
paddu

Reputation: 713

Remove duplicates from a string so that it appears once

Given a string which contains only lowercase letters, remove duplicate letters so that every letter appear once and only once. I must make sure your result is the smallest in lexicographical order among all possible results.

def removeDuplicates(str):
    dict = {}
    word = []
    for i in xrange(len(str)):
        if str[i] not in word:
            word.append(str[i])
            dict[str[i]] = i
        else:
            word.remove(str[i])
            word.append(str[i])
            dict[str[i]] = i

    ind = dict.values()
    # Second scan
    for i in xrange(len(str)):
        if str.index(str[i]) in ind:
            continue
        temp = dict[str[i]]
        dict[str[i]] = i
        lst = sorted(dict.keys(),key = lambda d:dict[d])
        if ''.join(lst) < ''.join(word):
            word = lst
        else:
            dict[str[i]] = temp
    return ''.join(word)

I am not getting the desired result

print removeDuplicateLetters("cbacdcbc")

Input:
"cbacdcbc"
Output:
"abcd"
Expected:
"acdb"

Upvotes: 1

Views: 590

Answers (3)

Reti43
Reti43

Reputation: 9797

Dorian's answer IS the way to go for any practical application, so my addition is mostly toying around.

If a word is really long, it's more efficient to just search whether each letter in the alphabet is in the string and keep only those that are present. Explicitly,

from string import ascii_lowercase

def removeDuplicates(string):
    return ''.join(letter for letter in ascii_lowercase if letter in string)

Code to test timings

import random
import timeit

def compare(string, n):
    s1 = "''.join(sorted(set('{}')))".format(string)
    print timeit.timeit(s1, number=n)
    s2 = "from string import ascii_lowercase; ''.join(letter for letter in ascii_lowercase if letter in '{}')".format(string)
    print timeit.timeit(s2, number=n)

Tests:

>>> word = 'cbacdcbc'
>>> compare(word, 1000)
0.00385931823843
0.013727678263
>>> word = ''.join(random.choice(ascii_lowercase) for _ in xrange(100000))
>>> compare(word, 1000)
2.21139290323
0.0071371927042
>>> word = 'a'*100000 + ascii_lowercase
>>> compare(word, 1000)
2.20644530225
1.63490857359

This shows that Dorian's answer should perform equally well or even better for small words, even though the speed isn't noticeable by humans. However, for very large strings, this method is much faster. Even for an edge case, where every letter is the same and the rest of the letters can only be found by transversing the whole string it performs better.

Still, Dorian's answer is more elegant and practical.

Upvotes: 2

paddu
paddu

Reputation: 713

This is what makes the test succeed.

def removeDuplicates(my_string):
    for char in sorted(set(my_string)):
        suffix = my_string[my_string.index(char):]
        if set(suffix) == set(my_string):
            return char + removeDuplicates(suffix.replace(char, ''))
    return ''

print removeDuplicates('cbacdcbc')

acdb

Upvotes: 0

Dorian Dore
Dorian Dore

Reputation: 962

Use a set. A set is a data structure similar to a list, but it removes all duplicates. You can instantiate a set by doing set(), or setting a variable to a set by using curly brackets. However, this isn't very good for instantiating empty sets, because then Python will think that it's a dictionary. So to achieve what you're doing, you could make the following function:

def removeDuplicates(string):
    return ''.join(sorted(set(string)))

Upvotes: 3

Related Questions