Emily
Emily

Reputation: 315

Word-level pos tagging in Python

I am trying to do pos tag for each word in each line (each line contains several sentences).

I have this code:

import nltk import pos_tag
import nltk.tokenize import word_tokenize

f = open('C:\Users\test_data.txt')
data = f.readlines()

#Parse the text file for NER with POS Tagging
for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    entities = nltk.chunk.ne_chunk(tagged)
    print entities
f.close()

But the code gives a tag for each line and the output looks like this:

[('The apartment is brand new and pristine in its cleanliness.', 'NNP'), ('"Awesome little place in the mountains."', 'NNP'), ('Very comfortable place close to the fatima luas stop. I love this place. \njose and vadym are very welcoming and treated me very well. \nwill stay again hopefully.', 'NNP'), ('Very helpful and communicative host. Excellent location, well connected to public transport . Room was a bit too small for a couple and the lack of cupboards was sorely felt.\n\nOtherwise quite clean and well maintained.', 'NNP'), ('Everything was exactly as described. It is beautiful. ', 'NNP')]

My code has 'tokenizer' and I don't know what's wrong with my code. I need pos tag for each word instead of for each line. But still each line should be chunked (or distinguished) by parenthesis or something like that.

Upvotes: 0

Views: 832

Answers (1)

mquantin
mquantin

Reputation: 1158

(pure copy paste from what runs on my computer)

Runing your code (note the simple import statement):

#!/usr/bin/env python3
# encoding: utf-8
import nltk
f = open('/home/matthieu/Téléchargements/testtext.txt')
data = f.readlines()

for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    entities = nltk.chunk.ne_chunk(tagged)
    print(entities)
f.close()

On the following unicode raw text file (3 lines):

(this is a first example.)(Another sentence in another parentheses.)
(onlyone in that line)
this is a second one wihtout parenthesis. (Another sentence in another parentheses.)

I get the following results:

(S
(/(
this/DT
is/VBZ
a/DT
first/JJ
example/NN
./.
)/)
(/(
Another/DT
sentence/NN
in/IN
another/DT
parentheses/NNS
./.
)/))
(S (/( onlyone/NN in/IN that/DT line/NN )/))
(S
this/DT
...

As you can see, there is no particular problem. Are you parsing correctly your csv data? is csv usefull in your case? did you try to use a simple text file?

Upvotes: 1

Related Questions