Reputation: 1111
I tried different tutorials to learn use of the Stanford part of speech tagger in Python. At present I am using the following code for Pos tagging using the Stanford tagger. However, I am getting an AttributeError
. My code is below:
import nltk
from nltk.tag.stanford import StanfordPOSTagger
english_postagger = StanfordPOSTagger('/home/szk/Downloads/NL2API/NL2API/tutorials/postags/stanford-postagger-2018-10-16/models/english-bidirectional-distsim.tagger', '/home/szk/Downloads/NL2API/NL2API/tutorials/postags/stanford-postagger-2018-10-16/stanford-postagger.jar')
english_postagger.tag('this is stanford postagger in nltk for python users'.split())
The error trace is below:
Traceback (most recent call last):
File "stanfordpostag.py", line 4, in <module>
english_postagger.tag('this is stanford postagger in nltk for python users'.split())
File "/home/szk/Downloads/NL2API/NL2API/newv/local/lib/python2.7/site-packages/nltk/tag/stanford.py", line 93, in tag
return sum(self.tag_sents([tokens]), [])
File "/home/szk/Downloads/NL2API/NL2API/newv/local/lib/python2.7/site-packages/nltk/tag/stanford.py", line 116, in tag_sents
cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE
File "/home/szk/Downloads/NL2API/NL2API/newv/local/lib/python2.7/site-packages/nltk/internals.py", line 112, in java
subprocess_output_dict = {'pipe': subprocess.PIPE, 'stdout': subprocess.STDOUT, 'devnull': subprocess.DEVNULL}
AttributeError: 'module' object has no attribute 'DEVNULL'
Hopefully someone can provide a solution.
Upvotes: 0
Views: 309
Reputation: 615
Couple of years later the stanza package for Python exists. The alternative is of course to go via NLTK see instructions here but StanfordNLP recommends using Stanza (see here). So I decided to use stanza and was surprised how easy it is to use. Also, it doesn't use much CPU or have other big requirements.
import stanza
stanza.download('de') # download German model
model = stanza.Pipeline('de', processors = 'tokenize, pos')
#initialise German neural pipeline
doc = model('Ich hoffe es gibt nicht wieder einen blöden Kommentar weil ich einen weiblichen Vornamen habe und wenig Reputationspunkte.')
print(*[f'word: {word.text}\tupos: {word.upos}' for sent in doc.sentences for word in sent.words], sep='\n')
Stanza offers different models (tokenization, PoS, NER, sentiment) for different languages. Check for details here. In comparison to TreeTagger it is definitively easier to install. I am comparing its performance with Spacy's and HanTa's. My yet to be confirmed impression is that it performs better than aforementioned two for German language.
Happy tagging!
Upvotes: 0
Reputation: 9450
I'm not sure why this doesn't work – it's still meant to – but from NLTK version 3.2.3 forward, you're much better off (for speed and scalability reasons) using the newer Stanford CoreNLP server interface discussed here: https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK .
So you might try that. It's fine to follow those instructions but substitute everywhere the current 2018-10-05 CoreNLP release rather than the previous version referenced in the instructions.
Upvotes: 1