Reputation: 313
I've been playing around with analogy queries over some publicly available word embeddings, in particular using the following:
numberbatch-en-19.08
from https://github.com/commonsense/conceptnet-numberbatchglove.42B.300d
from https://nlp.stanford.edu/projects/glove/glove.840B.300d
from https://nlp.stanford.edu/projects/glove/I'm doing some basic queries that include (where queryTarget
is what I am looking for):
baseSource:baseTarget :: querySource:queryTarget
e.g. man:woman :: king:queen
cosine_similarity(baseTarget-baseSource, queryTarget-querySource)
cosine_similarity(baseTarget-baseSource, queryTarget-querySource) * cosine_similarity(baseTarget-queryTarget,baseSource-querySource)
For the query:
man:woman :: king:?
The glove
data gives me the correct queen
, lady
, princess
results for the various matching strategies. However, conceptnet gives female_person
, adult_female
, king_david's_harp
as top 3, which I would not expect (queen is not in the top 20). Similarly, I see poor results regularly displace expected results that I do see in the glove results.
Does the conceptnet embedding require some sort of additional tweaking before I can use it? Or is it just not tailored/suited for English analogies?
Upvotes: 3
Views: 542