Reputation: 53
BERT_MODEL = "https://tfhub.dev/google/experts/bert/wiki_books/2"
PREPROCESS_MODEL = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"
preprocess = hub.load(PREPROCESS_MODEL)
bert = hub.load(BERT_MODEL)
inputs = preprocess(sentences)
outputs = bert(inputs)
I'm trying to get BERT embeddings for text-to-image generation. But I could not find how to change max length in these functions. Can you please explain how to do it?
Upvotes: 1
Views: 572
Reputation:
BERT can only take input sequences up to 512 tokens in length. This is quite a large limitation, since many common document types are much longer than 512 words. It inherits it’s architecture from the transformers, which themselves use self-attention, feed-forward layers, residual connections, and layer normalization as their foundational components.
The problems with BERT and large input documents are caused by a few aspects of BERT's architecture:
Transformers are autoregressive in and of themselves, and the designers of BERT saw a considerable drop in performance when using documents longer than 512 tokens. As a result, this limit was set to protect against low-quality output.
The self-attention model has a space complexity of O(n2). Because of the quadratic complexity, the modes require a lot of resources to fine-tune. The more input you provide, the more resources you will need to fine-tune the model. For most users, the quadratic complexity makes this too expensive.
On how to deal with long sequences please refer to this. Thank you!
Upvotes: 1