Avv
Avv

Reputation: 519

Scaling down high dimensional pandas' data frame data using sklean

I am trying to scale down values in pandas data frame. The problem is that I have 291 dimensions, so scale down the values one by one is time consuming if we are to do it as follows:

from sklearn.preprocessing import StandardScaler
sclaer = StandardScaler()
scaler = sclaer.fit(dataframe['dimension_1'])
dataframe['dimension_1'] = scaler.transform(dataframe['dimension_1'])

Problem: This is only for one dimension, so how we can do this please for the 291 dimension in one shot?

Upvotes: 2

Views: 153

Answers (2)

I normally use pipeline, since it can do multi-step transformation.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('std_scale', StandardScaler())])
transformed_dataframe = num_pipeline.fit_transform(dataframe)

If you need to do more for transformation, e.g. fill NA, you just add in the list (Line 3 of the code).

Note: The above code works, if the datatype of all columns is numeric. If not we need to

  1. select only numeric columns
  2. pass into the pipeline, then
  3. put the result back to the original dataframe.

Here is the code for the 3 steps:

num_col = dataframe.dtypes[df.dtypes != 'object'][dataframe.dtypes != 'bool'].index.to_list()
df_num = dataframe[num_col] #1
transformed_df = num_pipeline.fit_transform(dataframe) #2 
dataframe[num_col] = transformed_df #3

Upvotes: 1

yudhiesh
yudhiesh

Reputation: 6799

You can pass in a list of the columns that you want to scale instead of individually scaling each column.

# convert the columns labelled 0 and 1 to boolean values 
df.replace({0: False, 1: True}, inplace=True)

# make a copy of dataframe
scaled_features = df.copy()

# take the numeric columns i.e. those which are not of type object or bool
col_names = df.dtypes[df.dtypes != 'object'][df.dtypes != 'bool'].index.to_list()
features = scaled_features[col_names]

# Use scaler of choice; here Standard scaler is used
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)

scaled_features[col_names] = features

Upvotes: 2

Related Questions