Reputation: 9
I need to return True if any duplicates in the file. This is what I have but is not correct.
def duplicate(filename):
infile = open(filename)
contents = infile.read()
infile.close()
words = contents.split()
for word in words:
if words.count(word) > 1:
return True
else:
return False
file contents
This is a file with a duplicate. Just one.
You may try to find another but you'll never see it.
Upvotes: 0
Views: 176
Reputation: 13106
Usually a dictionary is nice for this kind of task (I'd suggest using a Counter
, but I don't think you're quite there yet).
Dictionaries are great for grouping data, since the keys are unique, and can be really useful for membership testing, since the speed of the test does not depend on the size of the dict. In this case, you can track the keys as words and the counts as values. Then return False
on the first dupe, which it looks like you tried to do:
def has_duplicate(filename):
# create the dictionary here
words = {}
# it is best to use a with statement to open a file
# that way you don't have to close it
with open(filename) as infile:
# you can iterate directly over the file
for line in infile:
for word in line.split():
# if the word is in the dictionary
# then you've seen it before and it's a duplicate
if word in words:
return True
# Otherwise, add it
else:
words[word] = 1
return False
This won't handle differences in capitalization or punctuation, as a caveat
Upvotes: 0
Reputation: 191854
You're returning on the first word count. Don't return false until inspecting all words
for word in words:
if words.count(word) > 1:
return True
return False
Also, you're not stripping punctuation, so word!
would be unique from word
It's also more performant to use a Counter
object
Plus, it's better to open a file like so
with open(filename) as infile:
lines = infile.readlines()
for line in lines:
for word in line.split():
...
return False
Upvotes: 2