How to preprocess features before training a TensorFlow model end-to-end in TF graph

Question

By this piece of code from the documentation, we can create multiple features to feed batches of data into a DNN model:

my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

But the problem is what is the proper way to transform the original features before they are fed to the input layer? Typical transformations that I can think of include normalization and clipping.

tf.feature_column.numeric_column does have a parameter specifying the normalization function. But the example in the doc only demonstrate a scenario where the normalization factors are pre-defined and fixed, like lambda x: (x-3.2)/1.5. How can I perform normalization (e.g. MinMaxScaler in sklearn) across all those features without knowing its maximum and minimum beforehand.

Also, is there any pipeline implementation where it's possible to do all sorts of feature transformations before they go into the input layer? Is creating a custom estimator tf.estimator.Estimator the answer to this problem? or anything else I'm not aware of.

How to preprocess features before training a TensorFlow model end-to-end in TF graph

Answers (1)

Related Questions