Deep learning regression models with datasets that have features and labels on different scales and distributions

Question

I am creating a deep learning model that combines many different datasets that have different biases or measurement errors and sometimes distributions within both their features and labels.

For the features, I was just internally normalizing each feature using the StandardScaler with respect to the dataset before combining for training due to the presence of outliers. I was also doing this with the labels before combining training datasets. However, I don't know how to deal with this for the test set. I figured I can also just internally normalize the features, but I'm not sure that I can do this with the labels for the test set. It doesn't make sense to apply the normalization parameters from training label to testing labels as each training dataset has different parameters. Any guidance as to a normalization scheme for training and testing features and labels is appreciated!

(All features and labels are continuous values)

Deep learning regression models with datasets that have features and labels on different scales and distributions

Answers (0)

Related Questions