Lcat
Lcat

Reputation: 929

Formatting question/answer data for Hugging Face

I have a csv with fields for id, context, question, answer_start, and text. I would like to import this into Hugging Face as a dataset for Q&A training in a format similar to Squad

Hugging Face expects the answer_start and text fields to be coupled into answers, I'm having trouble getting the data into a form which loads properly

Things tried: loading the csv into pandas and then

    df['answers'] = df.apply(lambda x: {"answer_start": x.answer_start, "text": x.text}, 
    df = df[['id','question','context','answers']]
    train_dataset = datasets.Dataset.from_pandas(df)

Any suggestions?

Upvotes: 0

Views: 110

Answers (1)

Lcat
Lcat

Reputation: 929

I got it working as follows

df['answers'] = df.apply(lambda x: {"answer_start": [x.answer_start], "text": [x.text]}

Upvotes: 0

Related Questions