Reputation: 519
I am trying to scale down values in pandas data frame. The problem is that I have 291 dimensions, so scale down the values one by one is time consuming if we are to do it as follows:
from sklearn.preprocessing import StandardScaler
sclaer = StandardScaler()
scaler = sclaer.fit(dataframe['dimension_1'])
dataframe['dimension_1'] = scaler.transform(dataframe['dimension_1'])
Problem: This is only for one dimension, so how we can do this please for the 291 dimension in one shot?
Upvotes: 2
Views: 153
Reputation: 44
I normally use pipeline, since it can do multi-step transformation.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('std_scale', StandardScaler())])
transformed_dataframe = num_pipeline.fit_transform(dataframe)
If you need to do more for transformation, e.g. fill NA, you just add in the list (Line 3 of the code).
Note: The above code works, if the datatype of all columns is numeric. If not we need to
Here is the code for the 3 steps:
num_col = dataframe.dtypes[df.dtypes != 'object'][dataframe.dtypes != 'bool'].index.to_list()
df_num = dataframe[num_col] #1
transformed_df = num_pipeline.fit_transform(dataframe) #2
dataframe[num_col] = transformed_df #3
Upvotes: 1
Reputation: 6799
You can pass in a list of the columns that you want to scale instead of individually scaling each column.
# convert the columns labelled 0 and 1 to boolean values
df.replace({0: False, 1: True}, inplace=True)
# make a copy of dataframe
scaled_features = df.copy()
# take the numeric columns i.e. those which are not of type object or bool
col_names = df.dtypes[df.dtypes != 'object'][df.dtypes != 'bool'].index.to_list()
features = scaled_features[col_names]
# Use scaler of choice; here Standard scaler is used
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features[col_names] = features
Upvotes: 2