oettam_oisolliv
oettam_oisolliv

Reputation: 226

Arrow related error when pushing dataset to Hugging-face hub

i have quite a problem with my dataset:

The (future) dataset is a pandas dataframe that i loaded from a pickle file, the pandas dataset behaves correctly. My code is:

dataset.from_pandas(df)
dataset.push_to_hub("username/my_dataset", private=True)

because I thought it was pandas fault I also tried:

dataset = Dataset.from_dict(df_sentences.to_dict(orient='list'))
dataset.push_to_hub("username/my_dataset", private=True)

and to load it from file.

The error I get is:

ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: string

My dataset is composed by 4 columns of type string and one of ints, around 3600 rows

Upvotes: 2

Views: 363

Answers (1)

SultanOrazbayev
SultanOrazbayev

Reputation: 16571

Without having a reproducible sample, it is hard to test it, but one option is to convert data to string[pyarrow] dtype:

dtypes = {
'column_a': 'string[pyarrow]',
'col_b': 'string[pyarrow]',
...
}

df_converted = df.astype(dtypes)
# proceed with the push

If possible, I would also upgrade to the latest versions, esp. for pyarrow and pandas.

Upvotes: 1

Related Questions