Reputation: 83
I want to open the a file and read it line by line. For each line I want split the line into a list of words using the split() method. Then I want check each word on each line to see if the word is already in the list and if not append it to the list. This is the code that I have written.
fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = list()
for line in fh:
stuff = line.rstrip().split()
for word in stuff:
if stuff not in stuff:
line1.append(stuff)
print line1
My problem is that when I print out line1 it prints out about 30 duplicate lists in a format like this.
['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'],
['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun'],
['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'],
['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'],
I want to know why that problem is happening and how to delete the duplicate words and lists.
Upvotes: 0
Views: 108
Reputation: 36
You have if stuff not in stuff
. If you change that line to if word not in line1:
and the next line to line1.append(word)
your code should work.
Alternatively, use sets.
fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
stuff = line.rstrip().split()
for word in stuff:
line1.add(word)
print line1
or even
fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
stuff = line.rstrip().split()
line1 = line1.union(set(stuff))
print line1
Sets will only contain unique values (although they have no concept of ordering or indexing), so you would not need to deal with checking whether a word has come up already: the set data type takes care of that automatically.
Upvotes: 2