Build the corpus by Wikipedia: ModuleNotFoundError: No module named 'gensim'

Question

I copy a simple Python script by Building a Wikipedia Text Corpus for Natural Language Processing to build the corpus by stripping all Wikipedia markup from the articles, using gensim. This is the cose:

"""
Creates a corpus from Wikipedia dump file.
Inspired by:
https://github.com/panyang/Wikipedia_Word2vec/blob/master/v1/process_wiki.py
"""

import sys
from gensim.corpora import WikiCorpus

    def make_corpus(in_f, out_f):

    """Convert Wikipedia xml dump file to text corpus"""

    output = open(out_f, 'w')
    wiki = WikiCorpus(in_f)

    i = 0
    for text in wiki.get_texts():
        output.write(bytes(' '.join(text), 'utf-8').decode('utf-8') + '
')
        i = i + 1
        if (i % 10000 == 0):
            print('Processed ' + str(i) + ' articles')
    output.close()
    print('Processing complete!')


if __name__ == '__main__':

    if len(sys.argv) != 3:
        print('Usage: python make_wiki_corpus.py  ')
        sys.exit(1)
    in_f = sys.argv[1]
    out_f = sys.argv[2]
    make_corpus(in_f, out_f)

Anyway, I obtained the error:

ModuleNotFoundError: No module named 'gensim'

although I have installed the gensim package:

python3 -m pip install gensim

EDIT. If I try with

pip install -U gensim

I obtain the error

 ImportError: cannot import name 'SourceDistribution' from 
 'pip._internal.distributions.source' (C:\Users\Standard\Anaconda3\lib\site- 
 packages\pip\_internal\distributions\source\__init__.py)

Harshal Parekh · Accepted Answer

You do not have the gensim module installed in your system.

pip install -U gensim

Or download it from https://pypi.python.org/pypi/gensim.

gensim depends on scipy and numpy. You must have them installed prior to installing gensim.

There is a bug in pip 20.0.0. Either upgrade to 20.0.1 using:

python get-pip.py

Or downgrade to 19.3.1.

python get-pip.py pip==19.3.1

Build the corpus by Wikipedia: ModuleNotFoundError: No module named 'gensim'

Answers (1)

Related Questions

Build the corpus by Wikipedia: ModuleNotFoundError: No module named &#39;gensim&#39;

Answers (1)

Related Questions

Build the corpus by Wikipedia: ModuleNotFoundError: No module named 'gensim'