Maybe
Maybe

Reputation: 83

Append in python

I want to open the a file and read it line by line. For each line I want split the line into a list of words using the split() method. Then I want check each word on each line to see if the word is already in the list and if not append it to the list. This is the code that I have written.

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = list()
for line in fh:
    stuff = line.rstrip().split()
    for word in stuff:
        if stuff not in stuff:
            line1.append(stuff)
print line1

My problem is that when I print out line1 it prints out about 30 duplicate lists in a format like this.

['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], 
['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun'], 
    ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
    ['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'], 
    ['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'],

I want to know why that problem is happening and how to delete the duplicate words and lists.

Upvotes: 0

Views: 108

Answers (1)

J. Doe
J. Doe

Reputation: 36

You have if stuff not in stuff. If you change that line to if word not in line1: and the next line to line1.append(word) your code should work.

Alternatively, use sets.

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
    stuff = line.rstrip().split()
    for word in stuff:
        line1.add(word)
print line1

or even

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
    stuff = line.rstrip().split()
    line1 = line1.union(set(stuff))
print line1

Sets will only contain unique values (although they have no concept of ordering or indexing), so you would not need to deal with checking whether a word has come up already: the set data type takes care of that automatically.

Upvotes: 2

Related Questions