Reputation: 2566
I have some data on which I applied scaling with scikit-learn. Once scaled I would like to recover the original data. Is this possible? If not, how can I get correspondance from the original data.
Here a toy example
from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
X_scale = scale(X)
print X[:4]
print X_scale[:4]
producing
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]]
[[-0.90068117 1.03205722 -1.3412724 -1.31297673]
[-1.14301691 -0.1249576 -1.3412724 -1.31297673]
[-1.38535265 0.33784833 -1.39813811 -1.31297673]
[-1.50652052 0.10644536 -1.2844067 -1.31297673]]
How from the second data can I recover the original data?
Upvotes: 3
Views: 5943
Reputation: 2566
MarkyD43 has provided a great answer to this question. Here is the code version of transforming the data back to the original version
from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
mean_of_array = X.mean(axis=0)
std_of_array = X.std(axis=0)
X_scale = scale(X)
X_original = (X_scale * std_of_array) + mean_of_array
print X[:4]
print X_original[:4]
producing
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]]
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]]
Upvotes: 3
Reputation: 467
One of the most common types of feature scaling methods scales the data by setting the mean value of a data set to zero, and the standard deviation to one. This is extremely useful for many learning algorithms. This is achieved simply using the following:
scaled_array = (original_array - mean_of_array)/std_of_array
In Sklearn, each array column appears to be scaled in this way. To find the original data, simply rearrange the above, or alternatively just calculate the standard deviation and mean of each column in the unscaled data. You can then use this to transform the scaled data back to the original data at any time.
For more information on how Sklearn's scaling works, the docs are here. To understand more about feature scaling generally, the wiki page is a good place to start.
Upvotes: 4