Reputation: 33
Trying to run the tokenizer for Bert but I keep getting errors. Can anyone help where I am going wrong.
FullTokenizer = bert.bert_tokenization.FullTokenizer
bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1", trainable=False)
vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
tokenizer = FullTokenizer(vocab_file, do_lower_case)
Error: AttributeError Traceback (most recent call last) in () ----> 1 FullTokenizer = bert.bert_tokenization.FullTokenizer 2 bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1", 3 trainable=False) 4 vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy() 5 do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
AttributeError: module 'bert' has no attribute 'bert_tokenization'
All the below have been imported for reference.
!pip install bert-for-tf2
!pip install sentencepiece
!pip install bert-tensorflow
!pip install tensorflow==2.0
try:
%tensorflow_version 2.x
except Exception:
pass
import tensorflow as to
import tensorflow_hub as hub
from tensorflow.keras import layers
import bert
from bert import tokenization
Upvotes: 2
Views: 3650
Reputation: 1
!pip install bert-tensorflow
!pip install --upgrade bert
!pip install tokenization
from bert import tokenization
from bert.tokenization.bert_tokenization import FullTokenizer
tokenizer = FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case)
Upvotes: 0
Reputation: 61
I was caught up in a similar situation before.
Try looking for a folder named "bert" in the directory where your script/notebook is being run. Delete that folder or rename it to something other than "bert". There is a very likely possibility that when you import bert, it tries to access that folder intead of the bert-for-tf2 which you installed in the Python site packages.
If still that doesn't work, try
from bert import tokenization
Upvotes: 3