Reputation: 71
I was trying chp 10.15 in book Think Python and wrote following codes:
def turn_str_to_list(string):
res = []
for letter in string:
res.append(letter)
return res
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
for j in range (i+1, len(t)-1, 1):
if t[i]==t[j]:
del t[j]
return t
line=raw_input('>>>')
t=turn_str_to_list(line)
print t
print sort_and_unique(t)
I used a double 'for' structure to eliminate any duplicated elements in a sorted list. However, when I ran it, I kept getting wrong outputs. if I input 'committee', the output is ['c', 'e', 'i', 'm', 'o', 't', 't'], which is wrong because it still contains double 't'. I tried different inputs, sometimes the program can't pick up duplicated letters in middle of the list, and it always can not pick up the ones at the end. What was I missing? Thanks guys.
Upvotes: 3
Views: 171
Reputation: 44092
So you want to have explained, what is wrong in your code. Here you are:
It would make our coding faster, if we get test case at hand from very begining
For testing I will make small utility function:
def textinout(text):
return "".join(sort_and_unique(list(text)))
This allows quick test like:
>>> textinout("committee")
"ceimot"
and another helper function for readable error traces:
def checkit(textin, expected):
msg = "For input '{textin}' we expect '{expected}', got '{result}'"
result = textinout(textin)
assert result == expected, msg.format(textin=textin, expected=expected, result=result)
And make the test case function:
def testit():
checkit("abcd", 'abcd')
checkit("aabbccdd", 'abcd')
checkit("a", 'a')
checkit("ddccbbaa", 'abcd')
checkit("ddcbaa", 'abcd')
checkit("committee", 'ceimot')
Let us make first test with existing function:
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
for j in range (i+1, len(t)-1, 1):
if t[i]==t[j]:
del t[j]
return t
Now we can test it:
testit()
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-11-291a15d81032> in <module>()
----> 1 testit()
<ipython-input-4-d8ad9abb3338> in testit()
1 def testit():
2 checkit("abcd", 'abcd')
----> 3 checkit("aabbccdd", 'abcd')
4 checkit("a", 'a')
5 checkit("ddccbbaa", 'abcd')
<ipython-input-10-620ac3b14f51> in checkit(textin, expected)
2 msg = "For input '{textin}' we expect '{expected}', got '{result}'"
3 result = textinout(textin)
----> 4 assert result == expected, msg.format(textin=textin, expected=expected, result=result)
AssertionError: For input 'aabbccdd' we expect 'abcd', got 'abcdd'
Reading the last line of error trace we know, what is wrong.
In most cases this is not efficient and it makes the code hard to read.
Instead of:
lst = ["a", "b", "c"]
for i in range(len(lst)):
itm = lst[i]
# do something with the itm
You should use:
lst = ["a", "b", "c"]
for itm in lst:
# do something with the itm
print itm
If you need to access subset of a list, use slicing
Instead of:
for i in range (0, len(lst)-2, 1):
itm = lst[i]
Use:
for itm in lst[:-2]:
# do something with the itm
print itm
If you really need to know position of processed item for inner loops, use enumerate:
Instead of:
lst = ["a", "b", "c", "d", "e"]
for i in range(0, len(lst)):
for j in range (i+1, len(lst)-1, 1):
itm_i = lst[i]
itm_j = lst[j]
# do something
Use enumerate, which turn each list item into tuple (index, item):
lst = ["a", "b", "c", "d", "e"]
for i, itm_i in enumerate(lst):
for itm_j in lst[i+1, -1]
print itm_i, itm_j
# do something
You are looping over a list and suddenly delete an item from it. List modification during iteration is generally better to avoid, if you have to do it, you have to think twice and take care, like iterating backward so that you do not modify that part, which is about to be processed in some next iteration.
As alternative to deleting an item from iterated list you can note findings (like duplicated items) to another list and after you are out of the loop, use it somehow.
def sort_and_unique (lst):
lst.sort()
to_remove = []
for i, itm_i in enumerate(lst[:-2]):
for j, itm_j in enumerate(lst[i+1: -1]):
if itm_i == itm_j:
to_remove.append(itm_j)
# now we are out of loop and can modify the lst
# note, we loop over one list and modify another, this is safe
for itm in to_remove:
lst.remove(itm)
return lst
Reading the code, the problem turns out: you never touch last item in the sorted list. That is why you do not get "t"
removed as it is alphabetically the last item after applying sort.
So your code could be corrected this way:
def sort_and_unique (lst):
lst.sort()
to_remove = []
for i, itm_i in enumerate(lst[:-1]):
for j, itm_j in enumerate(lst[i+1:]):
if itm_i == itm_j:
to_remove.append(itm_j)
for itm in to_remove:
lst.remove(itm)
return lst
From now on, the code is correct, and you shall prove it by calling testit()
>>> testit()
Silent test output is what we were dreaming about.
Having the test function make further code modification easy, as it will be quick to check, if things are still working as expected.
Anyway, the code can be shortened by getting tuples (itm_i, itm_j)
using zip
def sort_and_unique (lst):
lst.sort()
to_remove = []
for itm_i, itm_j in zip(lst[:-1], lst[1:]):
if itm_i == itm_j:
to_remove.append(itm_j)
for itm in to_remove:
lst.remove(itm)
return lst
Test it:
>>> testit()
or using list comprehension:
def sort_and_unique (lst):
lst.sort()
to_remove = [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]
for itm in to_remove:
lst.remove(itm)
return lst
Test it:
>>> testit()
As list comprehension (using []
) completes creation of returned value sooner then are the values
used, we can remove another line:
def sort_and_unique (lst):
lst.sort()
for itm in [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]:
lst.remove(itm)
return lst
Test it:
>>> testit()
Note, that so far, the code still reflects your original algorithm, only two bugs were removed:
- not manipulating list, we are iterating over
- taking into account also last item from the list
Upvotes: 0
Reputation: 425
The reason why your program isn't removing all the duplicate letters is because the use of del t[j]
in the nested for-loops is causing the program to skip letters.
I added some prints to help illustrate this:
def sort_and_unique (t):
t.sort()
for i in range (0, len(t)-2, 1):
print "i: %d" % i
print t
for j in range (i+1, len(t)-1, 1):
print "\t%d %s len(t):%d" % (j, t[j], len(t))
if t[i]==t[j]:
print "\tdeleting %c" % t[j]
del t[j]
return t
Output:
>>>committee
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
i: 0
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
1 e len(t):9
2 e len(t):9
3 i len(t):9
4 m len(t):9
5 m len(t):9
6 o len(t):9
7 t len(t):9
i: 1
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
2 e len(t):9
deleting e
3 m len(t):8
4 m len(t):8
5 o len(t):8
6 t len(t):8
7 t len(t):8
i: 2
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
3 m len(t):8
4 m len(t):8
5 o len(t):8
6 t len(t):8
i: 3
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
4 m len(t):8
deleting m
5 t len(t):7
6 t len(t):7
i: 4
['c', 'e', 'i', 'm', 'o', 't', 't']
5 t len(t):7
i: 5
['c', 'e', 'i', 'm', 'o', 't', 't']
i: 6
['c', 'e', 'i', 'm', 'o', 't', 't']
['c', 'e', 'i', 'm', 'o', 't', 't']
Whenever del t[j]
is called, the list becomes one element smaller but the inner j variable for-loops keeps iterating.
For example:
i=1, j=2, t = ['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
It sees that t[1] == t[2] (both 'e') so it removes t[2].
Now t = ['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
However, the code continues with i=1
, j=3
, which compares 'e' to 'm' and skips over 'i'.
Lastly, it is not catching the last two 't'
s because by the time i=5
, len(t)
is 7, so the conditions of the inner for-loop is range(6,6,1)
and is not executed.
Upvotes: 4
Reputation: 34657
Here you go:
In [1]: word = 'committee'
In [3]: word_ = set(word)
In [4]: word_
Out[4]: {'c', 'e', 'i', 'm', 'o', 't'}
The standard way to check for unique elements in python is to use a set. The constructor of a set takes any sequential object. A string is a collection of sequential ascii codes (or unicode codepoints), so it qualifies.
If you have further problems, do leave a comment.
Upvotes: 1
Reputation: 3000
In python you could make use of the inbuilt data structures and library functions like set()
& list()
Your turn_str_to_list()
can be done with list()
. Maybe you know this but wanted to do it on your own.
Using the list() and set() APIs:
line=raw_input('>>>')
print list(set(line))
Your sort_and_unique()
has a O(n^2) complexity. One of the ways to make cleaner:
def sort_and_unique2(t):
t.sort()
res = []
for i in t:
if i not in res:
res.append(i)
return res
This would still be O(n^2) since look up (i not in res) would be linear time, but code looks a bit cleaner. Deletion has complexity O(n), so instead you could do append to new list since append is O(1). See this for complexities of list API: https://wiki.python.org/moin/TimeComplexity
Upvotes: 2
Reputation: 44092
>>> word = "committee"
Turn string to list of characters:
>>> clst = list(word)
>>> clst
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
Use set
to get only unique items:
>>> unq_clst = set(clst)
>>> unq_clst
{'c', 'e', 'i', 'm', 'o', 't'}
It turns out (thanks Blckknght), that the list
step is not necessary and we could do that this way:
>>> unq_clst = set(word)
{'c', 'e', 'i', 'm', 'o', 't'}
Both, set
and list
are taking as parameter an iterable, and iterating over string returns one character by another.
Sort it:
>>> sorted(unq_clst)
['c', 'e', 'i', 'm', 'o', 't']
>>> sorted(set("COMMITTEE"))
['C', 'E', 'I', 'M', 'O', 'T']
Upvotes: 1
Reputation: 219
You can try the following code snippet
s = "committe"
res = sorted((set(list(s))))
Upvotes: 1