Gavrk
Gavrk

Reputation: 325

NLTK unable to find java.exe (spontaneous path reduction)

Similar questions were posted here and here, and my question is actually based on what was suggested in answers to those questions.

I try to parse some German texts using Stanford Parser and NLTK.

from nltk.parse.stanford import StanfordParser
import os
os.environ['STANFORD_PARSER'] ='C:\PretestKorpus\stanford-parser-full-2018-10-17'
os.environ['STANFORD_MODELS'] = 'C:\PretestKorpus\stanford-parser-full-2018-10-17'
parser=StanfordParser(model_path="C:\PretestKorpus\germanPCFG.ser.gz")
new=list(parser.raw_parse("Es war einmal ein Bauer"))

Then, of course, I get NLTK was unable to find the java file! error:

So I set configurations like this:

nltk.internals.config_java('C:\Program Files (x86)\Java\jre1.8.0_251\bin\java.exe')

but it returns

NLTK was unable to find the C:\Program Files (x86)\Java\jre1.8.0_251in\java.exe file!
Use software specific configuration paramaters or set the JAVAHOME environment variable.

So, somehow Python reduces the path \\jre1.8.0_251\bin\java.exe to \\jre1.8.0_251in\java.exe

Looks like this:

enter image description here

Setting environment variable does not help either (returns NLTK was unable to find the java file!error). Obviously, python does not read the path correctly. But for what reason and how to fix that? Any help will be appreciated.

Upvotes: 0

Views: 125

Answers (1)

Amir Schnell
Amir Schnell

Reputation: 651

In python \b inside a String is resolved to a backspace character. Therefore you see the white BS in the picture, becuase the console tries to represent this special character (BS for backspace).
What you need to do is to escape the \ inside your String like so

nltk.internals.config_java('C:\\Program Files (x86)\\Java\\jre1.8.0_251\\bin\\java.exe')

It is a good practice to alway escape all backslash characters, so you can be sure that problems like this one never occur.

Upvotes: 2

Related Questions