How to reconstruct raw data from scaled data?

Question

I have some data on which I applied scaling with scikit-learn. Once scaled I would like to recover the original data. Is this possible? If not, how can I get correspondance from the original data.

Here a toy example

from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
X_scale = scale(X)
print X[:4]
print X_scale[:4]

producing

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]
[[-0.90068117  1.03205722 -1.3412724  -1.31297673]
 [-1.14301691 -0.1249576  -1.3412724  -1.31297673]
 [-1.38535265  0.33784833 -1.39813811 -1.31297673]
 [-1.50652052  0.10644536 -1.2844067  -1.31297673]]

How from the second data can I recover the original data?

MarkyD43 · Accepted Answer

One of the most common types of feature scaling methods scales the data by setting the mean value of a data set to zero, and the standard deviation to one. This is extremely useful for many learning algorithms. This is achieved simply using the following:

scaled_array = (original_array - mean_of_array)/std_of_array

In Sklearn, each array column appears to be scaled in this way. To find the original data, simply rearrange the above, or alternatively just calculate the standard deviation and mean of each column in the unscaled data. You can then use this to transform the scaled data back to the original data at any time.

For more information on how Sklearn's scaling works, the docs are here. To understand more about feature scaling generally, the wiki page is a good place to start.

How to reconstruct raw data from scaled data?

Answers (2)

Related Questions