"SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function" while using Audio Transformers in Hugging Face

Question

I am trying to do a task of "Speech2Text" using transformer model in Hugging Face.

I tried the code in this documentation on hugging face

import torch
from transformers import Speech2TextProcessor, Speech2TextForConditionalGeneration
from datasets import load_dataset

model = Speech2TextForConditionalGeneration.from_pretrained("facebook/s2t-small-librispeech-asr")
processor = Speech2TextProcessor.from_pretrained("facebook/s2t-small-librispeech-asr")


ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")

inputs = processor(ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt")
generated_ids = model.generate(inputs["input_features"], attention_mask=inputs["attention_mask"])

transcription = processor.batch_decode(generated_ids)
transcription

but when I tried to run this code in Google Colab I am receiving the following error :

SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function

On checking the other error lines it seems that on calling processor(), return_tesnors is None even though it is specified as pt. Due to which code is importing tensorflow and that error is coming. (know issue)

Full error message :

SystemError                               Traceback (most recent call last)
 in 
      9 ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
     10 
---> 11 inputs = processor(ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="pt")
     12 
     13 generated_ids = model.generate(inputs["input_features"], attention_mask=inputs["attention_mask"])

10 frames
/usr/local/lib/python3.7/dist-packages/transformers/models/speech_to_text/processing_speech_to_text.py in __call__(self, *args, **kwargs)
     51         information.
     52         """
---> 53         return self.current_processor(*args, **kwargs)
     54 
     55     def batch_decode(self, *args, **kwargs):

/usr/local/lib/python3.7/dist-packages/transformers/models/speech_to_text/feature_extraction_speech_to_text.py in __call__(self, raw_speech, padding, max_length, truncation, pad_to_multiple_of, return_tensors, sampling_rate, return_attention_mask, **kwargs)
    230             pad_to_multiple_of=pad_to_multiple_of,
    231             return_attention_mask=return_attention_mask,
--> 232             **kwargs,
    233         )
    234 

/usr/local/lib/python3.7/dist-packages/transformers/feature_extraction_sequence_utils.py in pad(self, processed_features, padding, max_length, truncation, pad_to_multiple_of, return_attention_mask, return_tensors)
    161 
    162         if return_tensors is None:
--> 163             if is_tf_available() and _is_tensorflow(first_element):
    164                 return_tensors = "tf"
    165             elif is_torch_available() and _is_torch(first_element):

/usr/local/lib/python3.7/dist-packages/transformers/utils/generic.py in _is_tensorflow(x)
     96 
     97 def _is_tensorflow(x):
---> 98     import tensorflow as tf
     99 
    100     return isinstance(x, tf.Tensor)

/usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py in 
     35 import typing as _typing
     36 
---> 37 from tensorflow.python.tools import module_util as _module_util
     38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
     39 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/__init__.py in 
     35 
     36 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
---> 37 from tensorflow.python.eager import context
     38 
     39 # pylint: enable=wildcard-import

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/context.py in 
     27 import six
     28 
---> 29 from tensorflow.core.framework import function_pb2
     30 from tensorflow.core.protobuf import config_pb2
     31 from tensorflow.core.protobuf import coordination_config_pb2

/usr/local/lib/python3.7/dist-packages/tensorflow/core/framework/function_pb2.py in 
     14 
     15 
---> 16 from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
     17 from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
     18 from tensorflow.core.framework import op_def_pb2 as tensorflow_dot_core_dot_framework_dot_op__def__pb2

/usr/local/lib/python3.7/dist-packages/tensorflow/core/framework/attr_value_pb2.py in 
     14 
     15 
---> 16 from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
     17 from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
     18 from tensorflow.core.framework import types_pb2 as tensorflow_dot_core_dot_framework_dot_types__pb2

/usr/local/lib/python3.7/dist-packages/tensorflow/core/framework/tensor_pb2.py in 
     14 
     15 
---> 16 from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
     17 from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
     18 from tensorflow.core.framework import types_pb2 as tensorflow_dot_core_dot_framework_dot_types__pb2

/usr/local/lib/python3.7/dist-packages/tensorflow/core/framework/resource_handle_pb2.py in 
    148   ,
    149   'DESCRIPTOR' : _RESOURCEHANDLEPROTO,
--> 150   '__module__' : 'tensorflow.core.framework.resource_handle_pb2'
    151   # @@protoc_insertion_point(class_scope:tensorflow.ResourceHandleProto)
    152   })

SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function

here's my colab link for reference

Let me know what can be done to resolve this error
Thank you

Anandu KB · Accepted Answer

import tensorflow as tf
import torch
from transformers import Speech2TextProcessor, Speech2TextForConditionalGeneration
from datasets import load_dataset
........

Import tensorflow lib first even if you are not using it, before importing any torch libraries. Don't know the exact reason but after importing the lib code is working on the notebook you have shared.

Refer to these links:

"SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function" while using Audio Transformers in Hugging Face

Answers (1)

Related Questions

&quot;SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function&quot; while using Audio Transformers in Hugging Face

Answers (1)

Related Questions

"SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function" while using Audio Transformers in Hugging Face