Reputation: 929
I have a csv with fields for id, context, question, answer_start, and text. I would like to import this into Hugging Face as a dataset for Q&A training in a format similar to Squad
Hugging Face expects the answer_start and text fields to be coupled into answers, I'm having trouble getting the data into a form which loads properly
Things tried: loading the csv into pandas and then
df['answers'] = df.apply(lambda x: {"answer_start": x.answer_start, "text": x.text},
df = df[['id','question','context','answers']]
train_dataset = datasets.Dataset.from_pandas(df)
Any suggestions?
Upvotes: 0
Views: 110
Reputation: 929
I got it working as follows
df['answers'] = df.apply(lambda x: {"answer_start": [x.answer_start], "text": [x.text]}
Upvotes: 0