Reputation: 3
I am trying to scale only specific Numpy columns using Sklearn MinMaxScaler, however, scaling affects other data that is not used to fit or in the transform process.
Here is a simple example:
# lib import
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# original np array
original = np.array([[1, 2, 3], [4, 5, 6]])
>>> print(original)
[[1 2 3]
[4 5 6]]
# make a copy of the original array
copy = original
# minmax scaler
minmax_scaler = MinMaxScaler(feature_range=(0, 1))
# fit and transform only 2nd and 3rd positions
copy[:,1:] = minmax_scaler.fit_transform(copy[:,1:])
>>> print(copy)
[[1 0 0]
[4 0 1]]
>>> print(original)
[[1 0 0]
[4 0 1]]
Why are the original array values scaled as well?
Upvotes: 0
Views: 83
Reputation: 19322
This wont work -
copy = original
In Python, Assignment statements do not copy objects, they create bindings between a target and an object. When we use = operator we think that this creates a new object; well, it doesn’t.
It only creates a new variable that shares the reference of the original object. Sometimes a we want to work with mutable objects, in order to do that we look for a way to create “real copies” or “clones” of these objects.
You need to use a "Deep copy" or "Shallow copy"
# importing copy module
import copy
# initializing list 1
original = [1, 2, [3,5], 4]
# using copy for shallow copy
copy1 = copy.copy(original)
# using deepcopy for deepcopy
copy2 = copy.deepcopy(original)
You can read more about the differences between both of the shallow and deep copy on the internet. Multiple articles available.
Upvotes: 1