Noam Gershi
Noam Gershi

Reputation: 87

Loading huggingface dataset from in-memory text

I have in-memory text, json format, and I am trying to load dataset (HuggingFace) directly from text in-memory.

If I will save it into file - I can load the dataset using huggingface load_dataset:

from datasets import load_dataset
dataset = load_dataset('json', data_files='my_file.json')

See also: https://huggingface.co/docs/datasets/v1.11.0/loading_datasets.html#from-local-files

Can I load the dataset directly from the in-memory text without saving it into file?

Upvotes: 0

Views: 164

Answers (1)

Jules Gagnon-Marchand
Jules Gagnon-Marchand

Reputation: 3801

Build a dict from the json, then build the dataset object yourself:


import json
import datasets

the_json_string = "..." # you define this obviously

the_dict = json.loads(the_json_string) # loads builds a dict from a string

dataset_object = datasets.Dataset.from_dict(the_dict)

Look at the documentation for datasets.Dataset.from_dict for exactly how to make this work:

https://huggingface.co/docs/datasets/v2.2.1/en/package_reference/main_classes#datasets.Dataset.from_dict

Upvotes: 1

Related Questions