George
George

Reputation: 105

is there anyway to get the actual vector embedding of a word or set of characters using flair nlp? i.e flair embeddings

Basically I'm trying to use a custom flair language model to get a word or sentence's embedding in a vector. Is this possible or do flair embeddings only function when using flair NER models?

When using the embeddings .embed() function I receive an output like "[Sentence: "pain" [− Tokens: 1]]" where as I'm looking for the vector of continuous numbers.

Thank you.

Upvotes: 1

Views: 885

Answers (2)

teoML
teoML

Reputation: 836

from flair.embeddings import TransformerDocumentEmbeddings
from flair.data import Sentence

# init embedding by loading the model of your choice
embedding = TransformerDocumentEmbeddings("dbmdz/bert-base-german-uncased")

# create a sentence
sentence = Sentence('Der Rasen wächst schnell')

# embed words in sentence
embedding.embed(sentence)
#get the vector
print(sentence.embedding)

Upvotes: 0

dennlinger
dennlinger

Reputation: 11440

I'm quite confused because there is an official tutorial on word embeddings by the flair authors themselves, which seems to cover exactly this topic. I guess the problem is that you are confusing the processed sentence object from .embed() with the actual .embedding property of said object.

In any case, you can simply iterate over the word embeddings of individual tokens like so (taken from the tutorial mentioned above):

from flair.embeddings import WordEmbeddings
from flair.data import Sentence

# init embedding
glove_embedding = WordEmbeddings('glove')

# create sentence.
sentence = Sentence('The grass is green .')

# embed a sentence using glove.
glove_embedding.embed(sentence)

# now check out the embedded tokens.
for token in sentence:
    print(token)
    print(token.embedding)

I am not familiar enough with flair to know whether you can apply it to arbitrary character sequences, but it worked for tokens for me.

Upvotes: 1

Related Questions