Reputation: 713
Given a string which contains only lowercase letters, remove duplicate letters so that every letter appear once and only once. I must make sure your result is the smallest in lexicographical order among all possible results.
def removeDuplicates(str):
dict = {}
word = []
for i in xrange(len(str)):
if str[i] not in word:
word.append(str[i])
dict[str[i]] = i
else:
word.remove(str[i])
word.append(str[i])
dict[str[i]] = i
ind = dict.values()
# Second scan
for i in xrange(len(str)):
if str.index(str[i]) in ind:
continue
temp = dict[str[i]]
dict[str[i]] = i
lst = sorted(dict.keys(),key = lambda d:dict[d])
if ''.join(lst) < ''.join(word):
word = lst
else:
dict[str[i]] = temp
return ''.join(word)
I am not getting the desired result
print removeDuplicateLetters("cbacdcbc")
Input:
"cbacdcbc"
Output:
"abcd"
Expected:
"acdb"
Upvotes: 1
Views: 590
Reputation: 9797
Dorian's answer IS the way to go for any practical application, so my addition is mostly toying around.
If a word is really long, it's more efficient to just search whether each letter in the alphabet is in the string and keep only those that are present. Explicitly,
from string import ascii_lowercase
def removeDuplicates(string):
return ''.join(letter for letter in ascii_lowercase if letter in string)
Code to test timings
import random
import timeit
def compare(string, n):
s1 = "''.join(sorted(set('{}')))".format(string)
print timeit.timeit(s1, number=n)
s2 = "from string import ascii_lowercase; ''.join(letter for letter in ascii_lowercase if letter in '{}')".format(string)
print timeit.timeit(s2, number=n)
Tests:
>>> word = 'cbacdcbc'
>>> compare(word, 1000)
0.00385931823843
0.013727678263
>>> word = ''.join(random.choice(ascii_lowercase) for _ in xrange(100000))
>>> compare(word, 1000)
2.21139290323
0.0071371927042
>>> word = 'a'*100000 + ascii_lowercase
>>> compare(word, 1000)
2.20644530225
1.63490857359
This shows that Dorian's answer should perform equally well or even better for small words, even though the speed isn't noticeable by humans. However, for very large strings, this method is much faster. Even for an edge case, where every letter is the same and the rest of the letters can only be found by transversing the whole string it performs better.
Still, Dorian's answer is more elegant and practical.
Upvotes: 2
Reputation: 713
This is what makes the test succeed.
def removeDuplicates(my_string):
for char in sorted(set(my_string)):
suffix = my_string[my_string.index(char):]
if set(suffix) == set(my_string):
return char + removeDuplicates(suffix.replace(char, ''))
return ''
print removeDuplicates('cbacdcbc')
acdb
Upvotes: 0
Reputation: 962
Use a set
. A set is a data structure similar to a list, but it removes all duplicates. You can instantiate a set by doing set()
, or setting a variable to a set by using curly brackets. However, this isn't very good for instantiating empty sets, because then Python will think that it's a dictionary. So to achieve what you're doing, you could make the following function:
def removeDuplicates(string):
return ''.join(sorted(set(string)))
Upvotes: 3