Reputation: 61
I am creating a deep learning code that embeds text into BERT based embedding. I am seeing unexpected issues in a code that was working fine before. Below is the snippet:
sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True) # error comes here
Error is below:
['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
load_data()
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable
As you can see text_tokenizer.tokenize()
works fine. I tried force downloading the tokenizer and even changing the cache directory but to no good effect.
The code runs fine in some other machine (friend's laptop) and was also working fine in my some time back before I tried installing torchvision and using PIL library for image part. Now it's not somehow always giving this error.
OS: MacOS 11.6, using Conda environment, python=3.9
Upvotes: 2
Views: 6066
Reputation: 61
This was a rather easy fix. At some point, I had removed the transformer version from the environment.yml
file and I started using MV 2.x with python=3.9 which perhaps doesn't allow calling the tokenizer directly. I added the MV again as transformers=4.11.2
and added the channel conda-forge
in the yml file. After that I was able to get past this error.
Upvotes: 3