Juba
Juba

Reputation: 3

Sklearn scaling: scales the original data as well

I am trying to scale only specific Numpy columns using Sklearn MinMaxScaler, however, scaling affects other data that is not used to fit or in the transform process.

Here is a simple example:

# lib import 
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# original np array 
original = np.array([[1, 2, 3], [4, 5, 6]])

>>> print(original)
[[1 2 3]
 [4 5 6]]        

# make a copy of the original array 
copy = original

# minmax scaler 
minmax_scaler = MinMaxScaler(feature_range=(0, 1))

# fit and transform only 2nd and 3rd positions 
copy[:,1:] = minmax_scaler.fit_transform(copy[:,1:])


>>> print(copy)
[[1 0 0]
 [4 0 1]]

>>> print(original)
[[1 0 0]
 [4 0 1]]

Why are the original array values scaled as well?

Upvotes: 0

Views: 83

Answers (1)

Akshay Sehgal
Akshay Sehgal

Reputation: 19322

This wont work -

copy = original

In Python, Assignment statements do not copy objects, they create bindings between a target and an object. When we use = operator we think that this creates a new object; well, it doesn’t.

It only creates a new variable that shares the reference of the original object. Sometimes a we want to work with mutable objects, in order to do that we look for a way to create “real copies” or “clones” of these objects.

You need to use a "Deep copy" or "Shallow copy"

# importing copy module 
import copy 

# initializing list 1  
original = [1, 2, [3,5], 4] 


# using copy for shallow copy   
copy1 = copy.copy(original)  

# using deepcopy for deepcopy   
copy2 = copy.deepcopy(original) 

You can read more about the differences between both of the shallow and deep copy on the internet. Multiple articles available.

Upvotes: 1

Related Questions