Dimensionality reduction in sentence transformers

Question

I need to compute embeddings for a large number of sentences (say 10K) in preprocessing, and at runtime I will have to compute the embedding vector for one sentence at a time (user query), and then find the most resembling sentence based on the embedding vectors (using cosine similarity).

I'm currently using sentence transformers, and their output size is 768, which is too large for my case. So I'd like to experiment with smaller sizes, like 256 or even 128.

I'm familiar with PCA and quantization. However, both ChatGPT and Gemini suggested that I simply add a dense layer after the pooling layer. Example:

dense = models.Dense(in_features=base_model.get_sentence_embedding_dimension(), out_features=256)
model = SentenceTransformer(modules=[base_model, dense])

My problem with this is that I think that I would have to retrain / finetune my model, which I cannot do since I don't have labeled data. But ChatGPT and Gemini are claiming that I could get away with this implementation without retraining or finetuning, although "it would be better".

I'm confused how this could possibly work, because without training the initial weights in the dense layer would be random.

Am I missing something or is adding a dense layer without re-training / finetuning could actually work?

Dimensionality reduction in sentence transformers

Answers (1)

Related Questions