Reputation: 173
sklearn.preprocessing.normalize
only supports a 2D array normalization.
However, I currently have a 3D array for LSTM model training (batch, step, features) and I wish to normalize the features.
I have tried tf.keras.utils.normalize(X_train, axis=-1, order=2 )
But it is not correct.
Another way is to fold the 3D array into a 2D array
print(X_train.shape)
print(max(X_train[0][0]))
output
(1883, 100, 68)
6.028588763956215
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0], -1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0], -1)).reshape(X_test.shape)
print(X_train.shape)
print(max(X_train[0][0]))
print(min(X_train[0][0]))
output
(1883, 100, 68)
3.2232538993444533
-1.9056918449890343
The value is still not within 1 and -1.
How should I approach it?
Upvotes: 3
Views: 4087
Reputation: 22031
As suggested in the comments, I provided the answer
you can scale a 3D array with sklearn
preprocessing methods. you simply have to reconduct to 2D data to fit them and then reverse back to 3D. This can be done easily with a few lines of code.
if you want the scaled data to be in range (-1,1), you can simply use MinMaxScaler
specifying feature_range=(-1,1)
X_train = np.random.uniform(-20,100, (1883, 100, 68))
X_test = np.random.uniform(-20,100, (100, 100, 68))
print(X_train.shape)
print(X_train.min().round(5), X_train.max().round(5)) # -20, 100
scaler = MinMaxScaler(feature_range=(-1,1))
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0], -1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0], -1)).reshape(X_test.shape)
print(X_train.shape)
print(X_train.min().round(5), X_train.max().round(5)) # -1, 1
Upvotes: 6