amitgh
amitgh

Reputation: 61

Huggingface Tokenizer object is not callable

I am creating a deep learning code that embeds text into BERT based embedding. I am seeing unexpected issues in a code that was working fine before. Below is the snippet:

sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)  # error comes here

Error is below:

['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
    load_data()
  File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
    inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable

As you can see text_tokenizer.tokenize() works fine. I tried force downloading the tokenizer and even changing the cache directory but to no good effect.

The code runs fine in some other machine (friend's laptop) and was also working fine in my some time back before I tried installing torchvision and using PIL library for image part. Now it's not somehow always giving this error.

OS: MacOS 11.6, using Conda environment, python=3.9

Upvotes: 2

Views: 6066

Answers (1)

amitgh
amitgh

Reputation: 61

This was a rather easy fix. At some point, I had removed the transformer version from the environment.yml file and I started using MV 2.x with python=3.9 which perhaps doesn't allow calling the tokenizer directly. I added the MV again as transformers=4.11.2 and added the channel conda-forge in the yml file. After that I was able to get past this error.

Upvotes: 3

Related Questions