Hirak Sarkar
Hirak Sarkar

Reputation: 519

Stanford Tagger in nltk not working due to JVM parameters

I am having a wired error while running following example code snippet

st = StanfordTagger('bidirectional-distsim-wsj-0-18.tagger')
st.tag('What is the airspeed of an unladen swallow ?'.split())

The first line worked properly but second line is giving following error.

Could not create the Java virtual machine.

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/dist-packages/nltk-2.0.1rc1-   py2.6.egg/nltk/tag/stanford.py", line 51, in tag
return self.batch_tag([tokens])[0]
File "/usr/local/lib/python2.6/dist-packages/nltk-2.0.1rc1-py2.6.egg/nltk/tag/stanford.py", line 77, in batch_tag
stdout=PIPE, stderr=PIPE)
File "/usr/local/lib/python2.6/dist-packages/nltk-2.0.1rc1-py2.6.egg/nltk/internals.py", line 166, in java
raise OSError('Java command failed!')
OSError: Java command failed!

I have tried adding .usr/lib/jvm into path but still not working

Upvotes: 1

Views: 1954

Answers (2)

NG_
NG_

Reputation: 7173

I see that question is very outdated, but this days I got same error for unknown reason. It gives me a lot of headache. But I found solution.

First, I installed Oracle Java (here is instructions: How To Manually Install Oracle Java on a Debian or Ubuntu VPS)

Now, my python script told me more information on error. It outputs something like:

Forking JVM: error=12, Cannot allocate memory or error=12, Not enough space 

Here you can read more about such problem: Forking the JVM

And to avoid that annoying error I need to edit /etc/sysctl.conf and add the following:

vm.overcommit_memory = 1

Then restart system for the change to take effect.

Upvotes: 2

afs
afs

Reputation: 167

It wasn't working for me either. So I tried the following and its working perfectly.

st = POSTagger('path-to/stanford-postagger-full-2012-07-09/models/wsj-0-18-left3words.tagger','path-to/stanford-postagger-full-2012-07-09/stanford-postagger.jar')

and use nltk's tokenize method instead of Python's split()

taggedSentence= st.tag(nltk.word_tokenize(sentence))

Upvotes: 2

Related Questions