Reputation: 31218
I'm trying to read one text file (foo1.txt), remove all the nltk defined stopwords and write in another file (foo2.txt). Code is as following: Require import: from nltk.corpus import stopwords
def stop_words_removal():
with open("foo1.txt") as f:
reading_file_line = f.readlines() #entire content, return list
#print reading_file_line #list
reading_file_info = [item.rstrip('\n') for item in reading_file_line]
#print reading_file_info #List and strip \n
#print ' '.join(reading_file_info)
'''-----------------------------------------'''
#Filtering & converting to lower letter
for i in reading_file_info:
words_filtered = [e.lower() for e in i.split() if len(e) >= 4]
print words_filtered
'''-----------------------------------------'''
'''removing the strop words from the file'''
word_list = words_filtered[:]
#print word_list
for word in words_filtered:
if word in nltk.corpus.stopwords.words('english'):
print word
print word_list.remove(word)
'''-----------------------------------------'''
'''write the output in a file'''
z = ' '.join(words_filtered)
out_file = open("foo2.txt", "w")
out_file.write(z)
out_file.close()
The problem is the 2nd part of the code "removing the strop words from the file" does not work. Any suggestion will be greatly appreciated. Thanks.
Example Input File:
'I a Love this car there', 'positive',
'This a view is amazing there', 'positive',
'He is my best friend there', 'negative'
Example Output:
['love', "car',", "'positive',"]
['view', "amazing',", "'positive',"]
['best', "friend',", "'negative'"]
I tried as it suggested in this link, but none of them work
Upvotes: 2
Views: 8325
Reputation: 40993
This is what I would do, inside your function:
with open('input.txt','r') as inFile, open('output.txt','w') as outFile:
for line in inFile:
print(''.join([word for word in line.lower().translate(None, string.punctuation).split()
if len(word) >=4 and word not in stopwords.words('english')]), file=outFile)
Dont forget to add:
from __future__ import print_function
if you are on Python 2.x.
Upvotes: 3