foy
foy

Reputation: 427

Pytorch delete features columns from dataset

I have a dataset below and would like to delete features From A - F the dataset are converted from python dataframe

dataset = datasets.DatasetDict({"train":Dataset.from_pandas(X_train),
                        "test":Dataset.from_pandas(X_test),
                        "val":Dataset.from_pandas(X_val),
                      })

The dataset output like below

DatasetDict({
train: Dataset({
    features: ['A', 'B', 'C', 'D', 'E', 'F', 'text', '__index_level_0__', 'label'],
    num_rows: 1173
})
test: Dataset({
    features: ['A', 'B', 'C', 'D', 'E', 'F', 'text', '__index_level_0__', 'label'],
    num_rows: 1369
})
val: Dataset({
    features: ['A', 'B', 'C', 'D', 'E', 'F', 'text', '__index_level_0__', 'label'],
    num_rows: 1369
})

})

Result like below

DatasetDict({
train: Dataset({
    features: ['text', '__index_level_0__', 'label'],
    num_rows: 1173
})
test: Dataset({
    features: ['text', '__index_level_0__', 'label'],
    num_rows: 1369
})
val: Dataset({
    features: ['text', '__index_level_0__', 'label'],
    num_rows: 1369
})

})

Upvotes: 0

Views: 2114

Answers (1)

Timbus Calin
Timbus Calin

Reputation: 15023

What you need is the remove_columns() method from datasets. This works on any Dataset() object, if you want to remove some columns at this level and not in Pandas before.

dataset = dataset.remove_columns("label")

For your case, it would be:

dataset = dataset.remove_columns(['A', 'B', 'C', 'D', 'E', 'F'])

You can have a look here: https://huggingface.co/docs/datasets/process

Upvotes: 1

Related Questions