fbabelle
fbabelle

Reputation: 83

ImportError: No module named stanford_segmenter

The StanfordSegmenter does not have an interface in nltk, different from the case of StanfordPOStagger or StanfordNER. So to use it, basically I have to create an interface manually for StanfordSegmenter, namely stanford_segmenter.py under ../nltk/tokenize/. I follow the instructions here http://textminingonline.com/tag/chinese-word-segmenter

However, when I tried to run this from nltk.tokenize.stanford_segmenter import stanford_segmenter, I got an error

msg Traceback (most recent call last):
  File "C:\Users\qubo\Desktop\stanfordparserexp.py", line 48, in <module>
    from nltk.tokenize.stanford_segmenter import stanford_segmenter
ImportError: No module named stanford_segmenter
[Finished in 0.6s]

The instructions mentioned to reinstall nltk after creating the stanford_segmenter.py. I don't quite get the point but so I did. However, the process can hardly be called 'reinstall', but rather a detaching and reconnecting nltk to python libs.

I'm using Windows 64 and Python 2.7.11. NLTK and all relevant pkgs are updated to the latest version. Wonder if you guys can shed some light on this. Thank you all so much.

Upvotes: 2

Views: 973

Answers (1)

fbabelle
fbabelle

Reputation: 83

I was able to import the module by running the following code:

import imp

yourmodule = imp.load_source("module_name.py", "/path/to/module_name.py")
yourclass = yourmodule.TheClass()

yourclass is an instance of the class and TheClass is the name of the class you want to create the obj in. This is similar to the use of:

from pkg_name.module_name import TheClass

So in the case of StanfordSegmenter, the complete lines of code is as follows:

# -*- coding: utf-8 -*-
import imp
import os
ini_path = 'D:/jars/stanford-segmenter-2015-04-20/'
os.environ['STANFORD_SEGMENTER'] = ini_path + 'stanford-segmenter-3.5.2.jar'
stanford_segmenter = imp.load_source("stanford_segmenter", "C:/Users/qubo/Miniconda2/pkgs/nltk-3.1-py27_0/Lib/site-packages/nltk/tokenize/stanford_segmenter.py")
seg = stanford_segmenter.StanfordSegmenter(path_to_model='D:/jars/stanford-segmenter-2015-04-20/data/pku.gz', path_to_jar='D:/jars/stanford-segmenter-2015-04-20/stanford-segmenter-3.5.2.jar', path_to_dict='D:/jars/stanford-segmenter-2015-04-20/data/dict-chris6.ser.gz', path_to_sihan_corpora_dict='D:/jars/stanford-segmenter-2015-04-20/data')

sent = '我有一只小毛驴我从来也不骑。'
text = seg.segment(sent.decode('utf-8'))

Upvotes: 1

Related Questions