Anoop kottappuram
Anoop kottappuram

Reputation: 224

Same sentences produces a different vector in XLNet

I have computed the vectors for two same sentences using XLNet embedding-as-service. But the model produces different vector embeddings for both the two same sentences hence the cosine similarity is not 1 and the Euclidean distances also not 0. in case of BERT its works fine. for example; if

vec1 = en.encode(texts=['he is anger'],pooling='reduce_mean')
vec2 = en.encode(texts=['he is anger'],pooling='reduce_mean')

the model (XLNet) is saying that these two sentences are dissimilar.

Upvotes: 5

Views: 271

Answers (2)

Berkay Berabi
Berkay Berabi

Reputation: 2338

This is because of to the dropout layers in the model. During inference, the dropout layers should be turned off but there is a bug in the library. It is discussed here and apparently still not fixed.

See the discussion here: https://github.com/amansrivastava17/embedding-as-service/issues/45

In the mean time as suggested by @Davide Fiocco, you can use the straightforward approaches from HuggingFace. Either use forward, generate or pipeline.

Upvotes: 1

Davide Fiocco
Davide Fiocco

Reputation: 5914

As workaround, if you have some flexibility, what about using the vanilla transformers library instead?

Results from

from transformers import pipeline
embedder = pipeline("feature-extraction", model="xlnet-base-cased")
embedder("he is anger")

are deterministic.

Upvotes: 0

Related Questions