Inverse target variable scaling results in incorrect prediction results

Question

I am working with a dataset of 'Automated Essay Grading', which has multiple sets of essays, each with its own target score range. For example, set 01 has a score range from 2 to 12, set 02 has a range from 0 to 3, and so on. To normalize these score ranges for training purposes, I used the MinMaxScaler to scale the scores between 0 and 1.

However, the predicted results are not as good as expected, with a high difference between the predicted and target scores. This could be due to the scaling process, as scaling back the predictions using the inverse_transform method does not seem to produce accurate results. This is how I have scaled back:

test_loss, test_mae = lstm_model.evaluate([padded_essay_test, features_test], target_test)

# Make predictions on test data
predictions = lstm_model.predict([padded_essay_test, features_test])


#Scale back
original_predictions = scaler.inverse_transform(predictions)
df_predictions = pd.DataFrame(original_predictions, columns=['original_predictions'])

scores_2d = [[score] for score in df_test['predicted_score']]
original_target = scaler.inverse_transform(scores_2d)
df_target = pd.DataFrame(original_target, columns=['original_target'])

print('Test Loss:', test_loss)
print('Test MAE:', test_mae)
print(df_predictions)
print(df_target)

Here are the reuslt:

Test Loss: 0.16554389894008636
Test MAE: 0.32293763756752014
      original_predictions
0                 6.706153
1                 6.293279
2                 7.408381
3                 6.629674
4                 6.368900
...                    ...
4213             15.695969
4214             14.502607
4215             13.892921
4216             14.528075
4217             15.792664

[4218 rows x 1 columns]
      original_target
0                 7.0
1                 8.0
2                 9.0
3                 9.0
4                 9.0
...               ...
4213             33.0
4214             35.0
4215             38.0
4216             32.0
4217             39.0

[4218 rows x 1 columns]

I also tried using the formula (predicted score * (max score - min score)) + min score to scale back the predictions for each essay set instead of using inverse_transform, but it did not work as well.

subset_predictions_list = []
for subset_name, subset_range in essay_set_ranges.items():
    subset_predictions = df_predictions.loc[df_test['essay_set'] == int(subset_name)]
    subset_min, subset_max = subset_range
    subset_predictions = subset_predictions * (subset_max - subset_min) + subset_min
    subset_predictions_list.append(subset_predictions)

df_subset_predictions = pd.concat(subset_predictions_list)

There are no outliers in the testing data range. The testing data range lies within the training data range. I have checked other StackOverflow posts regarding this as well, but none of those provided solutions solves the aforementioned problem.

Inverse target variable scaling results in incorrect prediction results

Answers (0)

Related Questions