Reputation: 447
I am trying to calculate token similarity in spacy. I.e. how close word tokens are to one another. I am using spacy version 2.0.5. Here is my trivial example.
import spacy
from spacy.lang.en import English
from spacy.tokenizer import Tokenizer
nlp = spacy.load('en')
x = nlp(u'apple')
y = nlp(u'apple')
x.similarity(y)
This returns -81216639937292144.0 but I had expected it to be 1.0.
In addition
x = nlp(u'apple')
y = nlp(u'apples')
x.similarity(y)
returns 0.0038385278814858344 which seems wrong as well.
How should I be doing this token similarity so that it works? I am really trying to stay within Spacy (rather than using a different string distance package) but would also welcome suggestions if this just can't be done in spacy.
Upvotes: 0
Views: 2383
Reputation: 65
I too faced the same problem with version 2.0.5, you can roll back to version 2.0.2 where you will get a score like 1.0000000593284066 for 'apples' comparison to 'apples'.
For this first you have to uninstall all the libraries related to Spacy version 2.0.5,
for dep in $(pip show spacy | grep Requires | sed 's/Requires: //g; s/,//g') ; do pip uninstall -y $dep ; done
Then install version 2.0.2,
pip install spacy=='2.0.2'
Next validate,
python -m spacy validate
It might ask you to install some library, like ftfy or some other and when you try to install, it will be already installed. For those uninstall them first and then reinstall them again(this might happen 3-4 times for different libraries). We have to do this because lot of libraries get updated to latest version while installing spacy 2.0.5. And lastly download the model,
python -m spacy download en_core_web_sm
Upvotes: 0
Reputation: 1563
I tried doing same using spacy version 0.100.7. It works okay for me
import spacy
from spacy.en import English
from spacy.tokenizer import Tokenizer
nlp = spacy.load('en')
x = nlp(u'apple')
y = nlp(u'apple')
print (x.similarity(y)) # prints 0.999999947205
x = nlp(u'apple')
y = nlp(u'apple')
print (x.similarity(sy)) # prints 0.6678450944
Can you please check your version of spacy. Also, have you installed only deafult-en model?
Upvotes: 1