Word-level pos tagging in Python

Question

I am trying to do pos tag for each word in each line (each line contains several sentences).

I have this code:

import nltk import pos_tag
import nltk.tokenize import word_tokenize

f = open('C:\Users	est_data.txt')
data = f.readlines()

#Parse the text file for NER with POS Tagging
for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    entities = nltk.chunk.ne_chunk(tagged)
    print entities
f.close()

But the code gives a tag for each line and the output looks like this:

[('The apartment is brand new and pristine in its cleanliness.', 'NNP'), ('"Awesome little place in the mountains."', 'NNP'), ('Very comfortable place close to the fatima luas stop. I love this place. jose and vadym are very welcoming and treated me very well. will stay again hopefully.', 'NNP'), ('Very helpful and communicative host. Excellent location, well connected to public transport . Room was a bit too small for a couple and the lack of cupboards was sorely felt. Otherwise quite clean and well maintained.', 'NNP'), ('Everything was exactly as described. It is beautiful. ', 'NNP')]

My code has 'tokenizer' and I don't know what's wrong with my code. I need pos tag for each word instead of for each line. But still each line should be chunked (or distinguished) by parenthesis or something like that.

mquantin · Accepted Answer

(pure copy paste from what runs on my computer)

Runing your code (note the simple import statement):

#!/usr/bin/env python3
# encoding: utf-8
import nltk
f = open('/home/matthieu/Téléchargements/testtext.txt')
data = f.readlines()

for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    entities = nltk.chunk.ne_chunk(tagged)
    print(entities)
f.close()

On the following unicode raw text file (3 lines):

(this is a first example.)(Another sentence in another parentheses.)
(onlyone in that line)
this is a second one wihtout parenthesis. (Another sentence in another parentheses.)

I get the following results:

(S
(/(
this/DT
is/VBZ
a/DT
first/JJ
example/NN
./.
)/)
(/(
Another/DT
sentence/NN
in/IN
another/DT
parentheses/NNS
./.
)/))
(S (/( onlyone/NN in/IN that/DT line/NN )/))
(S
this/DT
...

As you can see, there is no particular problem. Are you parsing correctly your csv data? is csv usefull in your case? did you try to use a simple text file?

Word-level pos tagging in Python

Answers (1)

Related Questions