medo0070
medo0070

Reputation: 511

Facing AttributeError: 'int' object has no attribute 'toarray' in topic modelling

I am try to do topic modelling using LSA, with the following code:

and in the next line I have this:

top_n_words_lsa = get_top_n_words(10, 
                              lsa_keys, 
                              small_document_term_matrix, 
                              small_count_vectorizer)

for i in range(len(top_n_words_lsa)):
print("Topic {}: ".format(i+1), top_n_words_lsa[i])

But I am facing this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15164/599799419.py in <module>
----> 1 top_n_words_lsa = get_top_n_words(10, 
      2                                   lsa_keys,
      3                                   small_document_term_matrix,
      4                                   small_count_vectorizer)
      5 

 ~\AppData\Local\Temp/ipykernel_15164/2401631730.py in get_top_n_words(n, keys, 
 document_term_matrix, count_vectorizer)
      11             if keys[i] == topic:
      12                 temp_vector_sum += document_term_matrix[i]
 ---> 13         temp_vector_sum = temp_vector_sum.toarray()
      14         top_n_word_indices = np.flip(np.argsort(temp_vector_sum)[0][-n:],0)
      15         top_word_indices.append(top_n_word_indices)

 AttributeError: 'int' object has no attribute 'toarray'

The related helper functions are defined below:

# Define helper functions
def get_top_n_words(n, keys, document_term_matrix, count_vectorizer):
'''
returns a list of n_topic strings, where each string contains the n most common 
words in a predicted category, in order
'''
top_word_indices = []
for topic in range(n_topics):
    temp_vector_sum = 0
    for i in range(len(keys)):
        if keys[i] == topic:
            temp_vector_sum += document_term_matrix[i]
    temp_vector_sum = temp_vector_sum.toarray()
    top_n_word_indices = np.flip(np.argsort(temp_vector_sum)[0][-n:],0)
    top_word_indices.append(top_n_word_indices)   
  top_words = []
  for topic in top_word_indices:
    topic_words = []
    for index in topic:
        temp_word_vector = np.zeros((1,document_term_matrix.shape[1]))
        temp_word_vector[:,index] = 1
        the_word = count_vectorizer.inverse_transform(temp_word_vector)[0][0]
        topic_words.append(the_word.encode('ascii').decode('utf-8'))
    top_words.append(" ".join(topic_words))         
return top_words

Can you please tell me what I am missing here?

Upvotes: 0

Views: 737

Answers (1)

LukasNeugebauer
LukasNeugebauer

Reputation: 1337

You define temp_vector_sum as 0 and then add to it. So it's an object of type int. That class doesn't define a function toarray. You could do something like: np.array([temp_vector_sum]).

Upvotes: 0

Related Questions