kzs
kzs

Reputation: 1111

Stanford part of speech tagger gives Attribute error

I tried different tutorials to learn use of the Stanford part of speech tagger in Python. At present I am using the following code for Pos tagging using the Stanford tagger. However, I am getting an AttributeError. My code is below:

import nltk
from nltk.tag.stanford import StanfordPOSTagger
english_postagger = StanfordPOSTagger('/home/szk/Downloads/NL2API/NL2API/tutorials/postags/stanford-postagger-2018-10-16/models/english-bidirectional-distsim.tagger', '/home/szk/Downloads/NL2API/NL2API/tutorials/postags/stanford-postagger-2018-10-16/stanford-postagger.jar')
english_postagger.tag('this is stanford postagger in nltk for python users'.split())

The error trace is below:

Traceback (most recent call last):
  File "stanfordpostag.py", line 4, in <module>
    english_postagger.tag('this is stanford postagger in nltk for python users'.split())
  File "/home/szk/Downloads/NL2API/NL2API/newv/local/lib/python2.7/site-packages/nltk/tag/stanford.py", line 93, in tag
    return sum(self.tag_sents([tokens]), [])
  File "/home/szk/Downloads/NL2API/NL2API/newv/local/lib/python2.7/site-packages/nltk/tag/stanford.py", line 116, in tag_sents
    cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE
  File "/home/szk/Downloads/NL2API/NL2API/newv/local/lib/python2.7/site-packages/nltk/internals.py", line 112, in java
    subprocess_output_dict = {'pipe': subprocess.PIPE, 'stdout': subprocess.STDOUT, 'devnull': subprocess.DEVNULL}
AttributeError: 'module' object has no attribute 'DEVNULL'

Hopefully someone can provide a solution.

Upvotes: 0

Views: 309

Answers (2)

Simone
Simone

Reputation: 615

Couple of years later the stanza package for Python exists. The alternative is of course to go via NLTK see instructions here but StanfordNLP recommends using Stanza (see here). So I decided to use stanza and was surprised how easy it is to use. Also, it doesn't use much CPU or have other big requirements.

import stanza 
stanza.download('de') # download German model

model = stanza.Pipeline('de', processors = 'tokenize, pos')
#initialise German neural pipeline 
doc = model('Ich hoffe es gibt nicht wieder einen blöden Kommentar weil ich einen weiblichen Vornamen habe und wenig Reputationspunkte.')  
print(*[f'word: {word.text}\tupos: {word.upos}' for sent in doc.sentences for word in sent.words], sep='\n')  

Stanza offers different models (tokenization, PoS, NER, sentiment) for different languages. Check for details here. In comparison to TreeTagger it is definitively easier to install. I am comparing its performance with Spacy's and HanTa's. My yet to be confirmed impression is that it performs better than aforementioned two for German language.

Happy tagging!

Upvotes: 0

Christopher Manning
Christopher Manning

Reputation: 9450

I'm not sure why this doesn't work – it's still meant to – but from NLTK version 3.2.3 forward, you're much better off (for speed and scalability reasons) using the newer Stanford CoreNLP server interface discussed here: https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK .

So you might try that. It's fine to follow those instructions but substitute everywhere the current 2018-10-05 CoreNLP release rather than the previous version referenced in the instructions.

Upvotes: 1

Related Questions