Reputation: 9
I tried to run this code:
EN_AR = load_dataset("iwslt2017", "iwslt2017-ar-en", split="train").select(range(2000))
def extract_languages(examples):
inputs = [ex["ar"] for ex in examples['translation']]
target = [ex["en"] for ex in examples['translation']]
return {"inputs":inputs,"targets":target}
EN_AR = EN_AR.map(extract_languages,batched=True, remove_columns=["translation"])
from transformers import AutoTokenizer, MBart50TokenizerFast
model_name = "facebook/mbart-large-50"
tokenizer = AutoTokenizer.from_pretrained(model_name)
maxL = 128
def preprocess_func(examples):
model_inputs = tokenizer(examples["inputs"],max_length=maxL,truncation=True)
with tokenizer.as_target_tokenizer():
labels = tokenizer(examples["targets"],max_length=maxL,truncation=True)
model_inputs["labels"]= labels["input_ids"]
return model_name
tokenized_datasets = EN_AR.map(preprocess_func, batched = True, remove_columns=["inputs","targets"])
but it keeps telling me
TypeError Traceback (most recent call last) in <cell line: 15>() 13 return model_name 14 ---> 15 tokenized_datasets = EN_AR.map(preprocess_func, batched = True, remove_columns=>>>["inputs","targets"])
10 frames /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py in >convert_ids_to_tokens(self, ids, skip_special_tokens) 387 tokens = [] 388 for index in ids: --> 389 index = int(index) 390 if skip_special_tokens and index in self.all_special_ids: 391 continue
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
I took this code from the Hugging Face website for translation preprocessing, but I don't know why it does not work for me
Upvotes: 0
Views: 609
Reputation: 20500
Welcome.
The question suggests you are new to Python, so forgive me if I cover some basics.
Each variable and expression in Python has a type. For example, you can type type(5)
and get <class 'int'>
. Functions will often return value None
, of type <class 'NoneType'>
when there is no answer.
The int()
function converts variables from one type to another. For example, int(3)
, int(3.5)
, and int("3")
are all 3. int(None)
cannot be converted to an integer and gives the TypeError
message you are seeing.
Unfortunately, the call to int
is somewhere in your call stack. I suggest you use print statements or a debugger to look at the dataset you are loading, specifically for a place where you expect an integer weight and are missing it.
Add a comment about how it worked!
Upvotes: -1