Reputation: 83
The StanfordSegmenter does not have an interface in nltk, different from the case of StanfordPOStagger or StanfordNER. So to use it, basically I have to create an interface manually for StanfordSegmenter, namely stanford_segmenter.py under ../nltk/tokenize/. I follow the instructions here http://textminingonline.com/tag/chinese-word-segmenter
However, when I tried to run this from nltk.tokenize.stanford_segmenter import stanford_segmenter
, I got an error
msg Traceback (most recent call last):
File "C:\Users\qubo\Desktop\stanfordparserexp.py", line 48, in <module>
from nltk.tokenize.stanford_segmenter import stanford_segmenter
ImportError: No module named stanford_segmenter
[Finished in 0.6s]
The instructions mentioned to reinstall nltk after creating the stanford_segmenter.py. I don't quite get the point but so I did. However, the process can hardly be called 'reinstall', but rather a detaching and reconnecting nltk to python libs.
I'm using Windows 64 and Python 2.7.11. NLTK and all relevant pkgs are updated to the latest version. Wonder if you guys can shed some light on this. Thank you all so much.
Upvotes: 2
Views: 973
Reputation: 83
I was able to import the module by running the following code:
import imp
yourmodule = imp.load_source("module_name.py", "/path/to/module_name.py")
yourclass = yourmodule.TheClass()
yourclass
is an instance of the class and TheClass
is the name of the class you want to create the obj in. This is similar to the use of:
from pkg_name.module_name import TheClass
So in the case of StanfordSegmenter, the complete lines of code is as follows:
# -*- coding: utf-8 -*-
import imp
import os
ini_path = 'D:/jars/stanford-segmenter-2015-04-20/'
os.environ['STANFORD_SEGMENTER'] = ini_path + 'stanford-segmenter-3.5.2.jar'
stanford_segmenter = imp.load_source("stanford_segmenter", "C:/Users/qubo/Miniconda2/pkgs/nltk-3.1-py27_0/Lib/site-packages/nltk/tokenize/stanford_segmenter.py")
seg = stanford_segmenter.StanfordSegmenter(path_to_model='D:/jars/stanford-segmenter-2015-04-20/data/pku.gz', path_to_jar='D:/jars/stanford-segmenter-2015-04-20/stanford-segmenter-3.5.2.jar', path_to_dict='D:/jars/stanford-segmenter-2015-04-20/data/dict-chris6.ser.gz', path_to_sihan_corpora_dict='D:/jars/stanford-segmenter-2015-04-20/data')
sent = '我有一只小毛驴我从来也不骑。'
text = seg.segment(sent.decode('utf-8'))
Upvotes: 1