Reputation: 5273
I need to compute embeddings for a large number of sentences (say 10K) in preprocessing, and at runtime I will have to compute the embedding vector for one sentence at a time (user query), and then find the most resembling sentence based on the embedding vectors (using cosine similarity).
I'm currently using sentence transformers, and their output size is 768, which is too large for my case. So I'd like to experiment with smaller sizes, like 256 or even 128.
I'm familiar with PCA and quantization. However, both ChatGPT and Gemini suggested that I simply add a dense layer after the pooling layer. Example:
dense = models.Dense(in_features=base_model.get_sentence_embedding_dimension(), out_features=256)
model = SentenceTransformer(modules=[base_model, dense])
My problem with this is that I think that I would have to retrain / finetune my model, which I cannot do since I don't have labeled data. But ChatGPT and Gemini are claiming that I could get away with this implementation without retraining or finetuning, although "it would be better".
I'm confused how this could possibly work, because without training the initial weights in the dense layer would be random.
Am I missing something or is adding a dense layer without re-training / finetuning could actually work?
Upvotes: 0
Views: 160
Reputation: 1558
Adding a dense layer without finetuning doesn’t seem to be practical because its weights are initialized randomly, therefore it disrupts the semantic space and making embeddings less meaningful for tasks like similarity searches. As you mentioned you can use techniques like PCA
or UMAP
to reduce embeddings to your desired size. This is computationally efficient and preserves relationships in the embedding space.
However, I think the best approach would be to explore other sentence transformer models like all-MiniLM-L6-v2
(384 dimensions), which are already optimized for smaller embedding sizes while maintaining good accuracy. You can find a list of such models here.
Upvotes: 1