Reputation: 597
My nltk data is ~/nltk_data/corpora/words/(en,en-basic,README)
According to __init__.py
inside ~/lib/python2.7/site-packages/nltk/corpus
, to read a list of the words in the Brown Corpus, use
nltk.corpus.brown.words()
:
from nltk.corpus import brown
print brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
This __init__.py
has
words = LazyCorpusLoader(
'words', WordListCorpusReader, r'(?!README|\.).*')
So when I write from nltk.corpus import words
, am I importing the 'words' function from __init__.py
which resides in directory python2.7/site-packages/nltk/corpus
?
Also why does this happen:
import nltk.corpus.words
ImportError: No module named words
from nltk.copus import words
# WORKS FINE
The "brown" corpus resides inside ~/nltk_data/corpora
(and not in nltk/corpus). So why does this command work?
from nltk.corpus import brown
Shouldn't it be this?
from nltk_data.corpora import brown
Upvotes: 1
Views: 1014
Reputation: 3533
1.] Yes - by using LazyCorpusLoader from util where you can find the following description:
"""
A proxy object which is used to stand in for a corpus object
before the corpus is loaded. This allows NLTK to create an object
for each corpus, but defer the costs associated with loading those
corpora until the first time that they're actually accessed.
The first time this object is accessed in any way, it will load
the corresponding corpus, and transform itself into that corpus
(by modifying its own ``__class__`` and ``__dict__`` attributes).
If the corpus can not be found, then accessing this object will
raise an exception, displaying installation instructions for the
NLTK data package. Once they've properly installed the data
package (or modified ``nltk.data.path`` to point to its location),
they can then use the corpus object without restarting python.
"""
3.] nltk_data is the folder where the data is, that doesn't suppose to mean that the module is also in that folder (The data is downloaded from nltk_data)
NLTK has built-in support for dozens of corpora and trained models, as listed below. To use these within NLTK we recommend that you use the NLTK corpus downloader, >>> nltk.download()
Upvotes: 0
Reputation: 34205
Re. point 2: You can import either a module (import module.submodule
), or an object from a module (from module.submodule import variable
). While you can treat a module as a variable, because it actually is a variable in that scope (from module import submodule
), it doesn't work the other way. That's why when you try doing import module.submodule.variable
, it fails.
Re. point 3: Depends on what nltk.corpus
does. Maybe it searches/loads the nltk_data
for you automatically.
Upvotes: 2