kkulkarn
kkulkarn

Reputation: 345

Dropping multiple columns from dataframe

I have the following code snippet

{dataset: https://www.internationalgenome.org/data-portal/sample}

genome_data = pd.read_csv('../genome')
genome_data_columns = genome_data.columns

genPredict = genome_data[genome_data_columns[genome_data_columns != 'Geuvadis']]

This drops the column Geuvadis, is there a way I can include more than one column?

Upvotes: 0

Views: 206

Answers (3)

Tom Harvey
Tom Harvey

Reputation: 4370

Is it ok for you to not read them in the first place?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

The ‘usecols’ option in read_csv lets you specify the columns of data you want to include in the DataFrame.

Venkatesh-PrasadRanganath is the correct answer to how to drop multiple columns.

But if you want to avoid reading data into memory which you’re not going to use, genome_data = pd.read_csv('../genome', usecols=["only", "required", "columns"] is the syntax to use.

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

I think @Venkatesh-PrasadRanganath 's answer is better, but taking a similar approach to your attempt, this is how I would do it.:

  1. identify all of the columns with columns.to_list()'
  2. Create a list of columns to be excluded
  3. Subtract the columns to be excluded from the full list with list(set() - set())
  4. Select the remaining columns.

    genome_data = pd.read_csv('../genome')
    all_genome_data_columns = genome_data.columns.to_list()
    excluded_genome_data_columns = ['a', 'b', 'c'] #Type in the columns that you want to exclude here.
    genome_data_columns = list(set(all_genome_data_columns) - set(excluded_genome_data_columns))
    genPredict = genome_data[genome_data_columns]
    

Upvotes: 0

You could use DataFrame.drop like genome_data.drop(['Geuvadis', 'C2', ...], axis=1).

Upvotes: 1

Related Questions