Reputation: 345
I have the following code snippet
{dataset: https://www.internationalgenome.org/data-portal/sample}
genome_data = pd.read_csv('../genome')
genome_data_columns = genome_data.columns
genPredict = genome_data[genome_data_columns[genome_data_columns != 'Geuvadis']]
This drops the column Geuvadis, is there a way I can include more than one column?
Upvotes: 0
Views: 206
Reputation: 4370
Is it ok for you to not read them in the first place?
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
The ‘usecols’ option in read_csv lets you specify the columns of data you want to include in the DataFrame.
Venkatesh-PrasadRanganath is the correct answer to how to drop multiple columns.
But if you want to avoid reading data into memory which you’re not going to use, genome_data = pd.read_csv('../genome', usecols=["only", "required", "columns"] is the syntax to use.
Upvotes: 1
Reputation: 16683
I think @Venkatesh-PrasadRanganath 's answer is better, but taking a similar approach to your attempt, this is how I would do it.:
columns.to_list()
'list(set() - set())
Select the remaining columns.
genome_data = pd.read_csv('../genome')
all_genome_data_columns = genome_data.columns.to_list()
excluded_genome_data_columns = ['a', 'b', 'c'] #Type in the columns that you want to exclude here.
genome_data_columns = list(set(all_genome_data_columns) - set(excluded_genome_data_columns))
genPredict = genome_data[genome_data_columns]
Upvotes: 0
Reputation: 1886
You could use DataFrame.drop like genome_data.drop(['Geuvadis', 'C2', ...], axis=1)
.
Upvotes: 1