Paragraph embedding with ELMo

Question

I'm trying to understand how to prepare paragraphs for ELMo vectorization.

The docs only show how to embed multiple sentences/words at the time.

eg.

sentences = [["the", "cat", "is", "on", "the", "mat"],
         ["dogs", "are", "in", "the", "fog", ""]]
elmo(
     inputs={
          "tokens": sentences,
          "sequence_len": [6, 5]
            },
     signature="tokens",
     as_dict=True
    )["elmo"]

As I understand, this will return 2 vectors each representing a given sentence. How would I go about preparing input data to vectorize a whole paragraph containing multiple sentences. Note that I would like to use my own preprocessing.

Can this be done like so?

sentences = [["" "the", "cat", "is", "on", "the", "mat", ".", "", 
              "", "dogs", "are", "in", "the", "fog", ".", ""]]

or maybe like so?

sentences = [["the", "cat", "is", "on", "the", "mat", ".", 
              "dogs", "are", "in", "the", "fog", "."]]

Paragraph embedding with ELMo

Answers (1)

Related Questions