ayo joe
ayo joe

Reputation: 11

Attribute error: DatasetDict' object has no attribute 'to_tf_dataset'

I am working on fine tuning a data for an NLP project using the hugginface library. Here is the code i am having the challenge with. Has anyone been able to solve this problem?

from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")

tf_dataset = testdata.to_tf_dataset(
    columns=["input_ids", "token_type_ids", "attention_mask"],
    label_cols=["labels"],
    batch_size=2,
    collate_fn=data_collator,
    shuffle=True
)

NB: I have seen suggestions about upgrading to the latest versions, and i have done that but the problem perists.

Upvotes: 1

Views: 2590

Answers (2)

Kautilya Kondragunta
Kautilya Kondragunta

Reputation: 155

In your case testdata is of type DatasetDict that holds your train split. testdata['train'].to_tf_Dataset() however is a Dataset type and will work as expected.

Upvotes: 0

thank_for_this
thank_for_this

Reputation: 55

I faced the same problem. In my case I was working with a csv file. I used the following code to load the dataset:

from datasets import load_dataset
dataset_training = load_dataset("csv", file)

Then the method to_tf_dataset returned:

Attribute error: DatasetDict' object has no attribute 'to_tf_dataset'

To overcome this issue I loaded the content as a pandas Dataframe and then I loaded again using another method:

import pandas as pd
data = pd.read_csv("file.csv") 

from datasets import Dataset     
dataset = Dataset.from_pandas(data)

After that, to_tf_dataset method worked correctly. I have no explanation for this answer but it worked for me.

Upvotes: 2

Related Questions