Reputation: 53

How to set max length in preprocessing before using BERT function?

BERT_MODEL = "https://tfhub.dev/google/experts/bert/wiki_books/2"
PREPROCESS_MODEL = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"

preprocess = hub.load(PREPROCESS_MODEL)
bert = hub.load(BERT_MODEL)
inputs = preprocess(sentences)
outputs = bert(inputs)

I'm trying to get BERT embeddings for text-to-image generation. But I could not find how to change max length in these functions. Can you please explain how to do it?

Upvotes: 1

Answers (1)

user11530462

Reputation:

BERT can only take input sequences up to 512 tokens in length. This is quite a large limitation, since many common document types are much longer than 512 words. It inherits it’s architecture from the transformers, which themselves use self-attention, feed-forward layers, residual connections, and layer normalization as their foundational components.

The problems with BERT and large input documents are caused by a few aspects of BERT's architecture:

Transformers are autoregressive in and of themselves, and the designers of BERT saw a considerable drop in performance when using documents longer than 512 tokens. As a result, this limit was set to protect against low-quality output.
The self-attention model has a space complexity of O(n2). Because of the quadratic complexity, the modes require a lot of resources to fine-tune. The more input you provide, the more resources you will need to fine-tune the model. For most users, the quadratic complexity makes this too expensive.

On how to deal with long sequences please refer to this. Thank you!

Upvotes: 1

How to set max length in preprocessing before using BERT function?

Answers (1)

Related Questions