IssamLaradji
IssamLaradji

Reputation: 6865

(Gensim) ValueError: invalid shape, with the alpha parameter

First, is this the right way to get the topic distributions of the corpus on which LDA was performed?

lda = LdaModel(corpus,  num_topics=500, update_every=0, passes=2)
#get the topics distribution of the corpus
result=lda[corpus]

Now the issue occurs when I add the alpha parameter to the LDA and try to convert the corpus to a sparse matrix as follows:

  1- lda = LdaModel(corpus,  num_topics=500, update_every=0, passes=2,alpha=0.5)
  2- result=lda[corpus]
  3- gensim.matutils.corpus2csc(result).T

During the conversion from gensim corpus to the sparse matrix as in line 3, I get the error ValueError: invalid shape

I only get this problem when I add the ALPHA parameter!

The complete traceback:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-788-7fb54d5da9fb> in <module>()
----> 1 xp,xc=issam.lda(c)

C:\Anaconda\lib\issamKit.py in lda(X)
   1745      corpus=gensim.matutils.Sparse2Corpus(X.T)
   1746      lda = LdaModel(corpus,  num_topics=500, update_every=0, passes=2,alpha=1)
-> 1747      return lda,gensim.matutils.corpus2csc(lda[corpus]).T
   1748 def lsi(X):
   1749      import gensim

C:\Anaconda\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\matutils.pyc in corpus2csc(corpus, num_terms, dtype, num_docs, num_nnz, printprogress)
     97         data = numpy.asarray(data, dtype=dtype)
     98         indices = numpy.asarray(indices)
---> 99         result = scipy.sparse.csc_matrix((data, indices, indptr), shape=(num_terms, num_docs), dtype=dtype)
    100     return result
    101 

C:\Anaconda\lib\site-packages\scipy\sparse\compressed.pyc in __init__(self, arg1, shape, dtype, copy)
     66         # Read matrix dimensions given, if any
     67         if shape is not None:
---> 68             self.shape = shape   # spmatrix will check for errors
     69         else:
     70             if self.shape is None:

C:\Anaconda\lib\site-packages\scipy\sparse\base.pyc in set_shape(self, shape)
     69 
     70         if not (shape[0] >= 1 and shape[1] >= 1):
---> 71             raise ValueError('invalid shape')
     72 
     73         if (self._shape != shape) and (self._shape is not None):

ValueError: invalid shape

Upvotes: 1

Views: 572

Answers (1)

Radim
Radim

Reputation: 4266

Give corpus2csc the num_terms parameter. In your case, num_terms=500.

lda[corpus] produces sparse vectors, but the CSC format requires a definite dimension. When you don't supply num_terms explicitly, corpus2csc tries to guess it from your data, probably causing a mismatch.

Upvotes: 1

Related Questions