Reputation: 103
I have a number of datasets, which I create from a dictionary like so:
info = DatasetInfo(
description="my happy lil dataset",
version="0.0.1",
homepage="https://www.myhomepage.co.uk"
)
train_dataset = Dataset.from_dict(prepare_data(data["train"]), info=info)
test_dataset = Dataset.from_dict(prepare_data(data["test"]), info=info)
validation_dataset = Dataset.from_dict(prepare_data(data["validation"]),info=info)
I then combine these into a DatasetDict.
# Create a DatasetDict
dataset = DatasetDict(
{"train": train_dataset, "test": test_dataset, "validation": validation_dataset}
)
So far, so good. If I access dataset['train'].info.description
I see the expected result of "My happy lil dataset".
So I push to the hub, like so:
dataset.push_to_hub(f"{organization}/{repo_name}", commit_message="Some commit message")
And this succeeds too.
However, when I come to pull the dataset back down from the hub, and access the information associated with it, rather than getting the description of my dataset, I just get an empty string; like so:
pulled_data = full = load_dataset("f{organization}/{repo_name}", use_auth_token = True)
# I expect the following to print out "my happy lil dataset"
print(pulled_data["train"].info.description)
# However, instead it returns ''
Am I loading my data in from the hub incorrectly? Am I pushing only my dataset and not the info somehow? I feel like I’m missing something obvious, but I’m really not sure.
Upvotes: 0
Views: 64
Reputation: 34
It might be due to version caching of dataset. Without explicit version attribute, the library's default versioning may not preserve all metadata like Description.
Please include VERSION in a wrapper class like:
import datasets
class My_dataset(datasets.GeneratorBasedBuilder):
VERSION = datasets.Version("1.0.0")
def _info(self) -> datasets.DatasetInfo:
return datasets.DatasetInfo(
description="my happy lil dataset",
features=datasets.Features(
{
"f1": datasets.Value("string"), # list of features provided by your dataset with their types
"f2": datasets.Value("string"),
}
),
homepage="https://www.myhomepage.co.uk",
)
Upvotes: 0