Emily
Emily

Reputation: 315

Extracting nouns using POS tagging with Python (looping)

I would like to do extract only noun or nouns groups from huge text file. The python code below works fine but extract the nouns for only the last line. I am pretty sure the code requires 'append' but don't know how (I am a beginner of python.)

import nltk
import pos_tag
import nltk.tokenize 
import numpy

f = open(r'infile.txt', encoding="utf8")
data = f.readlines()

tagged_list = []

for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    nouns = [word for word,pos in tagged \
            if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')]
    downcased = [x.lower() for x in nouns]
    joined = " ".join(downcased).encode('utf-8')
    into_string = str(nouns)

output = open(r"outfile.csv", "wb")
output.write(joined)
output.close()

The result looks like this: apartment transport downtown, which are the noun words for the last line of the file. I'd like to save the nouns for each line of the file in one line. For example, the input file and the corresponding results should look like this.

Input file:
I like the milk.
I like the milk and bread.
I like the milk, bread, and butter.

Output file:
milk
milk bread
milk bread butter

Hope somebody helps to fix the code above.

Upvotes: 3

Views: 2425

Answers (1)

Alperen
Alperen

Reputation: 4602

Add a line end of the for loop, then write it to the file.

...
result = ""
for line in data:
    ...
    result += joined

output = open(r"outfile.csv", "w")
output.write(str(result))
output.close()

If you want to use append:

...
result_list = []
for line in data:
    ...
    result_list.append(joined)

output = open(r"outfile.csv", "w")
output.write(str(result_list))
output.close()

Also, you can use this writing way, if you use the result list:

...
output = open(r"outfile.csv", "w")
for item in result_list:
    output.write(str(item) + "\n")
output.close()

Upvotes: 2

Related Questions