Reputation: 9
How do I find list of duplicates from list of strings? clean_up function is given
def clean_up(s):
""" (str) -> str
Return a new string based on s in which all letters have been
converted to lowercase and punctuation characters have been stripped
from both ends. Inner punctuation is left untouched.
>>> clean_up('Happy Birthday!!!')
'happy birthday'
>>> clean_up("-> It's on your left-hand side.")
" it's on your left-hand side"
"""
punctuation = """!"',;:.-?)([]<>*#\n\t\r"""
result = s.lower().strip(punctuation)
return result
Here is my duplicate function.
def duplicate(text):
""" (list of str) -> list of str
>>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
'James Gosling\n']
>>> duplicate(text)
['james']
"""
cleaned = ''
non_duplicate = []
unique = []
for word in text:
cleaned += clean_up(word).replace(",", " ") + " "
words = cleaned.split()
for word in words:
if word in unique:
I am stuck in here.. I can't use dictionary or any other technique that keeps a count of the frequency of each word in the text. Please help..
Upvotes: 1
Views: 296
Reputation: 122023
You have a problem here:
cleaned += clean_up(word).replace(",", " ") + " "
This line adds the new "word" to a growing string of all words so far. Therefore each time through the for
loop, you recheck all words you have seen so far.
Instead, you need to do:
for phrase in text:
for word in phrase.split(" "):
word = clean_up(word)
This means you only process each word once. You may then need to add it to one of your lists, depending on whether it's already in either of them. I suggest you call your lists seen
and duplicates
, to make it clearer what is going on.
Upvotes: 1