crash
crash

Reputation: 4512

Unable to train simple autoencoder in Keras

I'm trying to train an autoencoder in Keras for signal processing but I'm somehow failing.

My inputs are segments of 128 frames length for 6 measures (acceleration_x/y/z, gyro_x/y/z), so the overall shape of my dataset is (22836, 128, 6) where 22836 is the sample size.

This is the sample code I'm using for the autoencoder:

X_train, X_test, Y_train, Y_test = load_dataset()

# reshape the input, whose size is (22836, 128, 6)
X_train = X_train.reshape(X_train.shape[0], np.prod(X_train.shape[1:]))
X_test = X_test.reshape(X_test.shape[0], np.prod(X_test.shape[1:]))
# now the shape will be (22836, 768)

### MODEL ###
input_shape = [X_train.shape[1]]
X_input = Input(input_shape)

x = Dense(1000, activation='sigmoid', name='enc0')(X_input)
encoded = Dense(350, activation='sigmoid', name='enc1')(x)
x = Dense(1000, activation='sigmoid', name='dec0')(encoded)
decoded = Dense(input_shape[0], activation='sigmoid', name='dec1')(x)

model = Model(inputs=X_input, outputs=decoded, name='autoencoder')

model.compile(optimizer='rmsprop', loss='mean_squared_error')
print(model.summary())

The output of model.summary() is

Model summary
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_55 (InputLayer)        (None, 768)               0         
_________________________________________________________________
enc0 (Dense)                 (None, 1000)              769000    
_________________________________________________________________
enc1 (Dense)                 (None, 350)               350350    
_________________________________________________________________
dec1 (Dense)                 (None, 1000)              351000    
_________________________________________________________________
dec0 (Dense)                 (None, 768)               768768    
=================================================================
Total params: 2,239,118
Trainable params: 2,239,118
Non-trainable params: 0

The training is done via

# train the model
history = model.fit(x = X_train, y = X_train,
                    epochs=5,
                    batch_size=32,
                    validation_data=(X_test, X_test))

where I'm simply trying to learn the identity function which yields:

Train on 22836 samples, validate on 5709 samples
Epoch 1/5
22836/22836 [==============================] - 27s 1ms/step - loss: 0.9481 - val_loss: 0.8862
Epoch 2/5
22836/22836 [==============================] - 24s 1ms/step - loss: 0.8669 - val_loss: 0.8358
Epoch 3/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8337 - val_loss: 0.8146
Epoch 4/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8164 - val_loss: 0.7960
Epoch 5/5
22836/22836 [==============================] - 25s 1ms/step - loss: 0.8004 - val_loss: 0.7819

At this point, to try to understand how well it performed, I check the plot of some true inputs vs the predicted ones:

prediction = model.predict(X_test)
for i in np.random.randint(0, 100, 7):
    pred = prediction[i, :].reshape(128,6)
    # getting only values for acceleration_x
    pred = pred[:, 0]
    true = X_test[i, :].reshape(128,6)
    # getting only values for acceleration_x
    true = true[:, 0]
    # plot original and reconstructed
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(20, 6))
    ax1.plot(true, color='green')
    ax2.plot(pred, color='red')

and these are some of the plots which appear to be completely wrong:

plot1

plot2

plot3

Do you have any suggestion on what's wrong, aside from the small number of epochs (which actually do not seem to make any difference)?

Upvotes: 1

Views: 328

Answers (1)

today
today

Reputation: 33410

Your data is not in the range [0,1] so why do you use sigmoid as the activation function in the last layer? Remove the activation function from the last layer (and it might be better to use relu in the previous layers).

Also normalize the training data. You can use feature-wise normalization:

X_mean = X_train.mean(axis=0)
X_train -= X_mean
X_std = X_train.std(axis=0)
X_train /= X_std + 1e-8

And don't forget to use the computed statistics (X_mean and X_std) in inference time (i.e. testing) to normalize test data.

Upvotes: 3

Related Questions