silent_hunter
silent_hunter

Reputation: 2508

Dropping several columns in data frame from list selection

I am working with dataset Titanic. I want to separate numerical columns from category columns. I try to do this whit this lines of codes:

from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

print("Numeric columns")
for column in dataset.columns:
    if is_numeric_dtype(dataset[column]):
        print(column)
print("----------------------------------")
print("Category columns")
for column in dataset.columns:
    if is_string_dtype(dataset[column]):
        print(column)

Output:

Numeric
columns
Unnamed: 0
credit_amount
installment_commitment
residence_since
age
existing_credits
num_dependents
accepted
----------------------------------
Category
columns
checking_status
duration
credit_history
purpose
savings_status
employment
personal_status
other_parties
property_magnitude
other_payment_plans
housing
job
own_telephone
foreign_worker
change_purpose
change_duration

So now I see clearly what is numerical category. Now I want to drop all numerical columns with names columns stored into columns_names

dataset_numerical = dataset.select_dtypes(include=['int64'])
columns_names = dataset_numerical.tolist()
dataset = dataset.drop([columns_names], axis=1)

This is stored into columns_names

['Unnamed: 0',
 'credit_amount',
 'installment_commitment',
 'residence_since',
 'age',
 'existing_credits',
 'num_dependents',
 'accepted']

So obviously I made mistake with last line of code so can can anybody help me how to solve this ?

I also try with this lines of codes but again nothing

to_drop = columns_names
to_drop_stripped = [x.strip() for x in to_drop.split(',')]
dataset.drop(columns=to_drop_stripped)

At the end I expect to drop all columns which names are stored into columns_names .

Upvotes: 2

Views: 61

Answers (2)

sophocles
sophocles

Reputation: 13831

Some minor tweaks are needed on your 2 chunks of codes. It's hard to be sure that this will work for you as I can't replicate exactly your dataset, but I think the below codes will work now.

# Code block 1
dataset_numerical = dataset.select_dtypes(include = ['int64'])
columns_names = dataset_numerical.columns.tolist()             # added the .columns
dataset= dataset.drop(columns_names, axis=1)                   # removed the [] brackets

# Code block 2
to_drop = columns_names
to_drop_stripped = [x.strip() for x in to_drop]     # removed .split() at the end
dataset.drop(columns=to_drop_stripped)

Upvotes: 2

DreamLand
DreamLand

Reputation: 42

 #Check the dtypes with
 dataset.dtypes

 #For a list of the columns with strings
 print(dataset.select_dtypes(include=object).columns.values)
 
 #Replace object with the dtype you are interested, without " "

Upvotes: 1

Related Questions