Reputation: 193
I am trying to read a file, make a list of words and then make a new list of words removing the duplicates. I am not able to append the words to the new list. it says none type object has no attribute'append'
Here is the bit of code:
fh = open("gdgf.txt")
lst = list()
file = fh.read()
for line in fh:
line = line.rstrip()
file = file.split()
for word in file:
if word in lst:
continue
lst = lst.append(word)
print lst
Upvotes: 2
Views: 5066
Reputation: 312
I think the solution to this problem can be more succinct:
import string
with open("gdgf.txt") as fh:
word_set = set()
for line in fh:
line = line.split()
for word in line:
# For each character in string.punctuation, iterate and remove
# from the word by replacing with '', an empty string
for char in string.punctuation:
word = word.replace(char, '')
# Add the word to the set
word_set.add(word)
word_list = list(word_set)
# Sort the set to be fastidious.
word_list.sort()
print(word_list)
One thing about counting words by "split" is that you are splitting on whitespace, so this will make "words" out of things like "Hello!"
and "Really?"
The words will include punctuation, which may probably not be what you want.
Your variable names could be a bit more descriptive, and your indentation seems a bit off, but I think it may be the matter of cutting/pasting into the posting. I have tried to name the variables I used based on whatever the logical structure it is I am interacting with (file, line, word, char, and so on).
To see the contents of 'string.punctuation' you can launch iPython, import string, then simply enter string.punctuation
to see what is the what.
It is also unclear if you need to have a list, or if you just need a data structure that contains a unique list of words. A set or a list that has been properly created to avoid duplicates should do the trick. Following on with the question, I used a set
to uniquely store elements, then converted that set
to a list
trivially, and later sorted this alphabetically.
Hope this helps!
Upvotes: 0
Reputation: 174624
You can simplify your code by reading and adding the words directly to a set. Sets do not allow duplicates, so you'll be left with just the unique words:
words = set()
with open('gdgf.txt') as f:
for line in f:
for word in line.strip():
words.add(word.strip())
print(words)
The problem with the logic above, is that words that end in punctuation will be counted as separate words:
>>> s = "Hello? Hello should only be twice in the list"
>>> set(s.split())
set(['be', 'twice', 'list', 'should', 'Hello?', 'only', 'in', 'the', 'Hello'])
You can see you have Hello?
and Hello
.
You can enhance the code above by using a regular expression to extract words, which will take care of the punctuation:
>>> set(re.findall(r"(\w[\w']*\w|\w)", s))
set(['be', 'list', 'should', 'twice', 'only', 'in', 'the', 'Hello'])
Now your code is:
import re
with open('gdgf.txt') as f:
words = set(re.findall(r"(\w[\w']*\w|\w)", f.read(), re.M))
print(words)
Even with the above, you'll have duplicates as Word
and word
will be counted twice. You can enhance it further if you want to store a single version of each word.
Upvotes: 0
Reputation: 20339
python append
will return None
.So set
will help here to remove duplicates.
In [102]: mylist = ["aa","bb","cc","aa"]
In [103]: list(set(mylist))
Out[103]: ['aa', 'cc', 'bb']
Hope this helps
In your case
file = fh.read()
After this fh
will be an empty generator.So you cannot use it since it is already used.You have to do operations with variable file
Upvotes: 4
Reputation: 90889
list.append()
is inplace append, it returns None
(as it does not return anything). so you do not need to set the return value of list.append()
back to the list. Just change the line - lst=lst.append(word)
to -
lst.append(word)
Another issue, you are first calling .read()
on the file and then iterating over its lines, you do not need to do that. Just remove the iteration part.
Also, an easy way to remove duplicates, if you are not interested in the order of the elements is to use set.
Example -
>>> lst = [1,2,3,4,1,1,2,3]
>>> set(lst)
{1, 2, 3, 4}
So, in your case you can initialize lst
as - lst=set()
. And then use lst.add()
element, you would not even need to do a check whether the element already exists or not. At the end, if you really want the result as a list, do - list(lst)
, to convert it to list. (Though when doing this, you want to consider renaming the variable to something better that makes it easy to understand that its a set
not a list
)
Upvotes: 1
Reputation: 871
append()
does not return anything, so don't assign it. lst.append()
is
enough.
Modified Code:
fh = open("gdgf.txt")
lst = []
file=fh.read()
for line in fh:
line = line.rstrip()
file=file.split()
for word in file:
if word in lst:
continue
lst.append(word)
print lst
I suggest you use set()
, because it is used for unordered collections of unique elements.
fh = open("gdgf.txt")
lst = []
file = fh.read()
for line in fh:
line = line.rstrip()
file = file.split()
lst = list( set(lst) )
print lst
Upvotes: 0
Reputation: 52103
append
appends an item in-place which means it does not return any value. You should get rid of lst=
when appending word
:
if word in lst:
continue
lst.append(word)
Upvotes: 1
Reputation: 6589
fh=open("gdgf.txt")
file=fh.read()
for line in fh:
line=line.rstrip()
lst = []
file=file.split()
for word in file:
lst.append(word)
print (set(lst))
Upvotes: 1
Reputation: 311103
append
modifies the list it was called on, and returns None
. I.e., you should replace the line:
lst=lst.append(word)
with simply
lst.append(word)
Upvotes: 1
Reputation: 63461
You are replacing your list with the return value of the append
function, which is not a list. Simply do this instead:
lst.append(word)
Upvotes: 1