jr123456jr987654321
jr123456jr987654321

Reputation: 324

Label encoding several columns in DataFrame but only those who need it

I have a pandas Dataframe which contains floats, dates, integers, and classes. Due to the sheer amount of column, what would be the most automated way for me to select columns who require it (mainly the ones which are classes) and then label encode those?

FYI: Dates must not be label encoded

Upvotes: 1

Views: 447

Answers (2)

bhola prasad
bhola prasad

Reputation: 725

Try this -

# To select numerical and categorical columns
num_cols = X_train.select_dtypes(exclude="object").columns.tolist()
cat_cols = X_train.select_dtypes(include="object").columns.tolist()

# you can also pass a list like - 
cat_cols = X_train.select_dtypes(include=["object", "category"]).columns.tolist()

After that you can make a pipeline like this -

# numerical data preprocessing pipeline
num_pipe = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())

# categorical data preprocessing pipeline
cat_pipe = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="NA"),
    OneHotEncoder(handle_unknown="ignore", sparse=False),
)

# full pipeline
full_pipe = ColumnTransformer(
    [("num", num_pipe, num_cols), ("cat", cat_pipe, cat_cols)]
)

Upvotes: 3

Corralien
Corralien

Reputation: 120391

You can use select_dtypes to select columns by data type or filter to select columns by name.

Upvotes: 1

Related Questions