manthangadhia
manthangadhia

Reputation: 1

SentenceTransformer encoding throws obscure `TypeError: 'float' object not scriptable` when trying to embed document list

My aim is to use BERTopic for semi-supervised, guided topic modelling on a set of parliamentary speeches already broken down to sentence-level to figure out which mode of energy-production they're talking about. I used a rudimentary tfidf + cosine_similarity combo to compute similarities between my sentences and my list of topic-specific keywords and assigned the associated labels to a subset of sentences which crossed a threshold similarity score, and followed convention by labelling the ambiguous sentences -1.

In my most recent attempt I decided to separately create sentence-embeddings to see if the error goes away since my topic_model was also resulting in the same error when letting it use its default embedding model and parameters.

My docs list contains sentences from my dataset in lower-case (I also tried to remove punctuations in some attempts). I am not sure if I am missing any key dependencies or perhaps I'm simply missing a crucial pre-processing step?

A snippet of the code I am trying to run:

docs = df['docs'].to_list()
assigned_labels = df['similarity_label'].to_list()

embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = embedding_model.encode(docs, show_progress_bar=True, batch_size=18)

The error stack I receive:

     18 embedding_model = SentenceTransformer(\"all-MiniLM-L6-v2\")
---> 19 embeddings = embedding_model.encode(docs, show_progress_bar=True, batch_size=18)

    484     sentences_batch = sentences_sorted[start_index : start_index + batch_size]
--> 485     features = self.tokenize(sentences_batch)

--> 922     return self._first_module().tokenize(texts)

    152 batch1, batch2 = [], []
    153 for text_tuple in texts:
--> 154     batch1.append(text_tuple[0])
    155     batch2.append(text_tuple[1])
    156 to_tokenize = [batch1, batch2]

TypeError: 'float' object is not subscriptable"

I don't understand where these floats are, and how can I deal with them?

Upvotes: 0

Views: 275

Answers (0)

Related Questions