Reputation: 1
I have tried the following example, which works very well. In the example file, the values are stored in 10-minute intervals. However, since I need to bring in more values that are just hourly available, I deleted from the database all values that were not at a full hour. Say: There are now only 1/6 as many rows and three more columns that are not selected in this test run so far.
If I now execute the code exactly as before, the following step will return
path_checkpoint = "model_checkpoint.h5"
es_callback = keras.callbacks.EarlyStopping(monitor="val_loss", min_delta=0, patience=5)
modelckpt_callback = keras.callbacks.ModelCheckpoint(
monitor="val_loss",
filepath=path_checkpoint,
verbose=1,
save_weights_only=True,
save_best_only=True,
)
history = model.fit(
dataset_train,
epochs=epochs,
validation_data=dataset_val,
callbacks=[es_callback, modelckpt_callback],
)
always the message the val_loss error for each epoch:
Epoch 1/10
871/871 [==============================] - ETA: 0s - loss: 0.4529
Epoch 1: val_loss did not improve from inf
871/871 [==============================] - 288s 328ms/step - loss: 0.4529 - val_loss: nan
I think it is related to this previous code block,
split_fraction = 0.715
train_split = int(split_fraction * int(df.shape[0]))
step = 6
past = 720
future = 72
learning_rate = 0.001
batch_size = 256
epochs = 10
def normalize(data, train_split):
data_mean = data[:train_split].mean(axis=0)
data_std = data[:train_split].std(axis=0)
return (data - data_mean) / data_std
where the original author specifies that only every sixth record should be used. Since I already removed every sixth record before, it should now use all records. Therefore I already tried to set step = 1, but without success. It still comes with the message that val_loss did not improve from inf
Does anyone know what else I would need to adjust to satisfy the code that I now have only one-sixth as many rows as originally thought? The result should initially end up with the same values as in the example because I have not yet used the new data.
Upvotes: 0
Views: 32
Reputation: 1
The Issue was inside the .csv file. In two of the 300000 rows, the date was formatted as 25.10.18, but in the other rows, the time was 25.10.2018.
After editing the format to a consistent dd.mm.yyyy, the val_loss decreased as expected.
If you are facing the same issue, this code can help you to find wrong formatted rows:
date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
Upvotes: 0