Reputation: 315
I would like to do extract only noun or nouns groups from huge text file. The python code below works fine but extract the nouns for only the last line. I am pretty sure the code requires 'append' but don't know how (I am a beginner of python.)
import nltk
import pos_tag
import nltk.tokenize
import numpy
f = open(r'infile.txt', encoding="utf8")
data = f.readlines()
tagged_list = []
for line in data:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
nouns = [word for word,pos in tagged \
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')]
downcased = [x.lower() for x in nouns]
joined = " ".join(downcased).encode('utf-8')
into_string = str(nouns)
output = open(r"outfile.csv", "wb")
output.write(joined)
output.close()
The result looks like this: apartment transport downtown, which are the noun words for the last line of the file. I'd like to save the nouns for each line of the file in one line. For example, the input file and the corresponding results should look like this.
Input file:
I like the milk.
I like the milk and bread.
I like the milk, bread, and butter.
Output file:
milk
milk bread
milk bread butter
Hope somebody helps to fix the code above.
Upvotes: 3
Views: 2425
Reputation: 4602
Add a line end of the for loop, then write it to the file.
...
result = ""
for line in data:
...
result += joined
output = open(r"outfile.csv", "w")
output.write(str(result))
output.close()
If you want to use append:
...
result_list = []
for line in data:
...
result_list.append(joined)
output = open(r"outfile.csv", "w")
output.write(str(result_list))
output.close()
Also, you can use this writing way, if you use the result list:
...
output = open(r"outfile.csv", "w")
for item in result_list:
output.write(str(item) + "\n")
output.close()
Upvotes: 2