Reputation: 511
I am try to do topic modelling using LSA, with the following code:
and in the next line I have this:
top_n_words_lsa = get_top_n_words(10,
lsa_keys,
small_document_term_matrix,
small_count_vectorizer)
for i in range(len(top_n_words_lsa)):
print("Topic {}: ".format(i+1), top_n_words_lsa[i])
But I am facing this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15164/599799419.py in <module>
----> 1 top_n_words_lsa = get_top_n_words(10,
2 lsa_keys,
3 small_document_term_matrix,
4 small_count_vectorizer)
5
~\AppData\Local\Temp/ipykernel_15164/2401631730.py in get_top_n_words(n, keys,
document_term_matrix, count_vectorizer)
11 if keys[i] == topic:
12 temp_vector_sum += document_term_matrix[i]
---> 13 temp_vector_sum = temp_vector_sum.toarray()
14 top_n_word_indices = np.flip(np.argsort(temp_vector_sum)[0][-n:],0)
15 top_word_indices.append(top_n_word_indices)
AttributeError: 'int' object has no attribute 'toarray'
The related helper functions are defined below:
# Define helper functions
def get_top_n_words(n, keys, document_term_matrix, count_vectorizer):
'''
returns a list of n_topic strings, where each string contains the n most common
words in a predicted category, in order
'''
top_word_indices = []
for topic in range(n_topics):
temp_vector_sum = 0
for i in range(len(keys)):
if keys[i] == topic:
temp_vector_sum += document_term_matrix[i]
temp_vector_sum = temp_vector_sum.toarray()
top_n_word_indices = np.flip(np.argsort(temp_vector_sum)[0][-n:],0)
top_word_indices.append(top_n_word_indices)
top_words = []
for topic in top_word_indices:
topic_words = []
for index in topic:
temp_word_vector = np.zeros((1,document_term_matrix.shape[1]))
temp_word_vector[:,index] = 1
the_word = count_vectorizer.inverse_transform(temp_word_vector)[0][0]
topic_words.append(the_word.encode('ascii').decode('utf-8'))
top_words.append(" ".join(topic_words))
return top_words
Can you please tell me what I am missing here?
Upvotes: 0
Views: 737
Reputation: 1337
You define temp_vector_sum
as 0 and then add to it. So it's an object of type int
. That class doesn't define a function toarray
. You could do something like: np.array([temp_vector_sum])
.
Upvotes: 0