Frans Huang
Frans Huang

Reputation: 49

Gensim Segmentation Fault

I did research on Google also on Gensim Support forum, but I cannot find a good answer.

Basically, I am implementing online learning for Doc2Vec using Gensim, but Gensim keeps throwing me a random error called "Segmentation

Please take a look at my sample code

from gensim.models import Doc2Vec
from gensim.models.doc2vec import LabeledSentence
import random
import logging

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    sentence1 = "this is a test"
    sentence2 = "test test 123 test"
    sentence3 = "qqq zzz"
    sentence4 = "ppp"

    sentences = [
        LabeledSentence(sentence1.split(), ["p1"]),
        LabeledSentence(sentence2.split(), ["p2"])
    ]
    model = Doc2Vec(min_count=1, window=5, size=400, sample=1e-4, negative=5, workers=1)
    model.build_vocab(sentences)

    for a in range(2):
        random.shuffle(sentences)
        print([s.tags[0] for s in sentences])
        model.train(sentences)
    model.save("test.d2v")

    new_model = Doc2Vec.load("test.d2v")
    new_sentences = [
        LabeledSentence(sentence1.split(), ["n1"]),
        LabeledSentence(sentence3.split(), ["n2"])
    ]
    new_model.build_vocab(new_sentences, update=True)

    for a in range(4):
        random.shuffle(new_sentences)
        print([s.tags[0] for s in new_sentences])
        new_model.train(new_sentences)

Here is my error

INFO:gensim.models.word2vec:training model with 1 workers on 7 vocabulary and 400 features, using sg=0 hs=0 sample=0.0001 negative=5 window=5
INFO:gensim.models.word2vec:expecting 2 sentences, matching count from corpus used for vocabulary survey
Segmentation fault

Can anyone explain to me why? and how to solve this?

Thanks

Upvotes: 1

Views: 1061

Answers (1)

gojomo
gojomo

Reputation: 54173

A segmentation fault – that is, an illegal memory access – should be nearly impossible to trigger from your Python code. That suggests this could be a problem specific to your installation/configuration – OS, Python, gensim, support-libraries – even a corrupted file.

Try clearing & reinstalling the Python environment & support libraries (like NumPy and SciPy), and confirming that some of the examples bundled with gensim run without a segmentation fault - like for example the notebook in docs/notebooks/doc2vec-lee.ipynb. If you're still getting such faults with either the bundled examples or your own code, turn on debug logging, capture all output, and report the problem with full details on your OS/Python/gensim/etc versions.

Upvotes: 1

Related Questions