Dante
Dante

Reputation: 41

ModuleNotFoundError with python transformers library despite it being installed in venv when running invoke task

Title. I'm currently trying to run import a module that uses transformers but it throws the following error:

(tf2venv) dante@dante-Inspiron-5570:~/projects/classification$ inv process-pdf test.pdf
Using TensorFlow backend.
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/dante/projects/classification/venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/dante/projects/classification/venv/bin/inv", line 8, in <module>
    sys.exit(program.run())
  File "/home/dante/projects/classification/venv/lib/python3.7/site-packages/invoke/program.py", line 373, in run
    self.parse_collection()
  File "/home/dante/projects/classification/venv/lib/python3.7/site-packages/invoke/program.py", line 465, in parse_collection
    self.load_collection()
  File "/home/dante/projects/classification/venv/lib/python3.7/site-packages/invoke/program.py", line 696, in load_collection
    module, parent = loader.load(coll_name)
  File "/home/dante/projects/classification/venv/lib/python3.7/site-packages/invoke/loader.py", line 76, in load
    module = imp.load_module(name, fd, path, desc)
  File "/home/dante/.pyenv/versions/3.7.0/lib/python3.7/imp.py", line 235, in load_module
    return load_source(name, filename, file)
  File "/home/dante/.pyenv/versions/3.7.0/lib/python3.7/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 696, in _load
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/dante/projects/classification/tasks.py", line 10, in <module>
    from app import ClassifyDocument
  File "/home/dante/projects/classification/app.py", line 15, in <module>
    from docType_classification import classify, grouping, utils
  File "/home/dante/projects/classification/docType_classification/classify.py", line 13, in <module>
    import common.hybrid as hybrid
  File "/home/dante/projects/classification/common/hybrid.py", line 3, in <module>
    import transformers
ModuleNotFoundError: No module named 'transformers'

The code in common/hybrid.py is as follows:

import transformers
from tokenizers import BertWordPieceTokenizer
import tqdm
import numpy as np

def build_tokenizer():
    # load the real tokenizer
    tokenizer = transformers.DistilBertTokenizer.from_pretrained(
        "distilbert-base-uncased"
    )
    # Save the loaded tokenizer locally
    tokenizer.save_pretrained(".")
    # Reload it with the huggingface tokenizers library
    hugging_face_tokenizer = BertWordPieceTokenizer("vocab.txt", lowercase=False)
    return hugging_face_tokenizer


def encode(texts, tokenizer, chunk_size=256, maxlen=512):
    tokenizer.enable_truncation(max_length=maxlen)
    tokenizer.enable_padding(length=maxlen)
    all_ids = []

    print(len(texts))
    for i in tqdm(range(0, len(texts), chunk_size)):
        text_chunk = texts[i : i + chunk_size].tolist()
        encs = tokenizer.encode_batch(text_chunk)
        all_ids.extend([enc.ids for enc in encs])

    return np.array(all_ids)

It is imported in classify.py as:

import common.hybrid as hybrid

I'm able to compile and run this file with

python3 common/hybrid.py

without any errors.

When running an invoke task with

invoke process-data

the file tasks.py is located in the root project directory.

I get the ModuleNotFoundError as soon as it reaches the transformers import.

Note that even when adding

import tensorflow

above the transformers import, this is imported correctly and the error isn't thrown until

import transformers

pip freeze output:

absl-py==0.12.0
appdirs==1.4.4
astunparse==1.6.3
attrs==20.3.0
backcall==0.2.0
bearbones==2.300
black==20.8b1
boto3==1.9.85
botocore==1.12.253
cachetools==4.2.1
certifi==2020.12.5
cfgv==3.2.0
chardet==3.0.4
click==7.1.2
decorator==5.0.6
distlib==0.3.1
docutils==0.15.2
fancycompleter==0.9.1
filelock==3.0.12
flake8==3.9.0
fuzzysearch==0.7.3
gast==0.3.3
google-auth==1.28.1
google-auth-oauthlib==0.4.4
google-pasta==0.2.0
grpcio==1.37.0
h5py==2.10.0
identify==2.2.3
idna==2.8
invoke==1.5.0
ipython==7.14.0
ipython-genutils==0.2.0
isort==5.8.0
jedi==0.18.0
jmespath==0.10.0
joblib==1.0.1
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
lxml==4.6.3
Markdown==3.3.4
mccabe==0.6.1
more-itertools==8.7.0
mypy-extensions==0.4.3
nodeenv==1.6.0
numpy==1.20.2
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.9
pandas==1.1.5
parso==0.8.2
pathspec==0.8.1
pdbpp==0.10.2
pdf2image==1.10.0
pdftotext==2.1.5
pexpect==4.8.0
pickleshare==0.7.5
pikepdf==1.7.1
Pillow==8.2.0
pipdeptree==2.0.0
pluggy==0.13.1
pre-commit==2.12.0
prompt-toolkit==3.0.18
protobuf==3.15.8
ptyprocess==0.7.0
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.7.0
pyflakes==2.3.1
Pygments==2.8.1
pyparsing==2.4.7
PyPDF2==1.26.0
pyrepl==0.9.0
pytest==5.3.5
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
redis==3.3.11
regex==2021.4.4
requests==2.21.0
requests-oauthlib==1.3.0
rsa==4.7.2
s3transfer==0.1.13
sacremoses==0.0.44
scipy==1.4.1
sentencepiece==0.1.95
six==1.15.0
tenacity==6.0.0
tensorboard==2.2.2
tensorboard-plugin-wit==1.8.0
tensorflow==2.2.0
tensorflow-estimator==2.2.0
termcolor==1.1.0
tokenizers==0.10.2
toml==0.10.2
tqdm==4.60.0
traitlets==5.0.5
transformers==4.4.2
typed-ast==1.4.3
typing-extensions==3.7.4.3
urllib3==1.24.1
virtualenv==20.4.3
wcwidth==0.2.5
Werkzeug==1.0.1
wmctrl==0.3
wrapt==1.12.1

other info:

(tf2venv) dante@dante-Inspiron-5570:~/projects/classification$ which python
/home/dante/projects/classification/tf2venv/bin/python
(tf2venv) dante@dante-Inspiron-5570:~/projects/classification$ which inv
/home/dante/projects/classification/tf2venv/bin/inv
(tf2venv) dante@dante-Inspiron-5570:~/projects/classification$ python3 --version
Python 3.8.0

Note that there are no circular imports and I've tried various versions of transformers(v3-4)

Everything was installed with pip3, the venv was created with

python3 -m venv tf2venv

I've tried deleting the venv and reinstalling various times but nothing works. Is there something missing that is causing this ModuleNotFoundError with transformers?

My requirements.txt is

bearbones>=2
fuzzysearch~=0.7.3
ipython~=7.14.0
Keras~=2.3.0
pdf2image~=1.10.0
pikepdf~=1.7.0
tenacity~=6.0.0
tensorflow==2.2.0
transformers==3.0.2
pandas~=1.1.5
pytest~=5.3.2
pdftotext~=2.1.4

Upvotes: 4

Views: 1666

Answers (0)

Related Questions