Michael
Michael

Reputation: 2566

How to reconstruct raw data from scaled data?

I have some data on which I applied scaling with scikit-learn. Once scaled I would like to recover the original data. Is this possible? If not, how can I get correspondance from the original data.

Here a toy example

from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
X_scale = scale(X)
print X[:4]
print X_scale[:4]

producing

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]
[[-0.90068117  1.03205722 -1.3412724  -1.31297673]
 [-1.14301691 -0.1249576  -1.3412724  -1.31297673]
 [-1.38535265  0.33784833 -1.39813811 -1.31297673]
 [-1.50652052  0.10644536 -1.2844067  -1.31297673]]

How from the second data can I recover the original data?

Upvotes: 3

Views: 5943

Answers (2)

Michael
Michael

Reputation: 2566

MarkyD43 has provided a great answer to this question. Here is the code version of transforming the data back to the original version

from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data

mean_of_array = X.mean(axis=0)
std_of_array = X.std(axis=0)

X_scale = scale(X)

X_original = (X_scale * std_of_array) + mean_of_array

print X[:4]
print X_original[:4]

producing

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]

Upvotes: 3

MarkyD43
MarkyD43

Reputation: 467

One of the most common types of feature scaling methods scales the data by setting the mean value of a data set to zero, and the standard deviation to one. This is extremely useful for many learning algorithms. This is achieved simply using the following:

scaled_array = (original_array - mean_of_array)/std_of_array

In Sklearn, each array column appears to be scaled in this way. To find the original data, simply rearrange the above, or alternatively just calculate the standard deviation and mean of each column in the unscaled data. You can then use this to transform the scaled data back to the original data at any time.

For more information on how Sklearn's scaling works, the docs are here. To understand more about feature scaling generally, the wiki page is a good place to start.

Upvotes: 4

Related Questions