My nltk data is ~/nltk_data/corpora/words/(en,en-basic,README) According to __init__.py inside ~/lib/python2.7/site-packages/nltk/corpus , to read a list of the words in the Brown Corpus, use nltk.corpus.brown.words() : from nltk.corpus import brown print brown.words() ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...] This __init__.py has words = LazyCorpusLoader( 'words', WordListCorpusReader, r'(?!README|\.).*') So when I write from nltk.corpus import words , am I importing the 'words' function from __init__.py which resides in directory python2.7/site-packages/nltk/corpus ? Also why does this happen: import nltk.corpus.words ImportError: No module named words from nltk.copus import words # WORKS FINE The "brown" corpus resides inside ~/nltk_data/corpora (and not in nltk/corpus). So why does this command work? from nltk.corpus import brown Shouldn't it be this? from nltk_data.corpora import brown

Reputation: 597

understanding nltk with python

My nltk data is ~/nltk_data/corpora/words/(en,en-basic,README)

According to __init__.py inside ~/lib/python2.7/site-packages/nltk/corpus, to read a list of the words in the Brown Corpus, use nltk.corpus.brown.words():

from nltk.corpus import brown
print brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

This __init__.py has

words = LazyCorpusLoader(
    'words', WordListCorpusReader, r'(?!README|\.).*')

So when I write from nltk.corpus import words, am I importing the 'words' function from __init__.py which resides in directory python2.7/site-packages/nltk/corpus?

Also why does this happen:

 import nltk.corpus.words
 ImportError: No module named words
 from nltk.copus import words
 # WORKS FINE

The "brown" corpus resides inside ~/nltk_data/corpora (and not in nltk/corpus). So why does this command work?
```
from nltk.corpus import brown
```
Shouldn't it be this?
```
from nltk_data.corpora import brown
```

Upvotes: 1

Answers (2)

badc0re

Reputation: 3533

1.] Yes - by using LazyCorpusLoader from util where you can find the following description:

"""
    A proxy object which is used to stand in for a corpus object
    before the corpus is loaded.  This allows NLTK to create an object
    for each corpus, but defer the costs associated with loading those
    corpora until the first time that they're actually accessed.

    The first time this object is accessed in any way, it will load
    the corresponding corpus, and transform itself into that corpus
    (by modifying its own ``__class__`` and ``__dict__`` attributes).

    If the corpus can not be found, then accessing this object will
    raise an exception, displaying installation instructions for the
    NLTK data package.  Once they've properly installed the data
    package (or modified ``nltk.data.path`` to point to its location),
    they can then use the corpus object without restarting python.
    """

3.] nltk_data is the folder where the data is, that doesn't suppose to mean that the module is also in that folder (The data is downloaded from nltk_data)

NLTK has built-in support for dozens of corpora and trained models, as listed below. To use these within NLTK we recommend that you use the NLTK corpus downloader, >>> nltk.download()

Upvotes: 0

viraptor

Reputation: 34205

Re. point 2: You can import either a module (import module.submodule), or an object from a module (from module.submodule import variable). While you can treat a module as a variable, because it actually is a variable in that scope (from module import submodule), it doesn't work the other way. That's why when you try doing import module.submodule.variable, it fails.

Re. point 3: Depends on what nltk.corpus does. Maybe it searches/loads the nltk_data for you automatically.

Upvotes: 2

understanding nltk with python

Answers (2)

Related Questions