James Wong
James Wong

Reputation: 1137

How to preprocess features before training a TensorFlow model end-to-end in TF graph

By this piece of code from the documentation, we can create multiple features to feed batches of data into a DNN model:

my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

But the problem is what is the proper way to transform the original features before they are fed to the input layer? Typical transformations that I can think of include normalization and clipping.

tf.feature_column.numeric_column does have a parameter specifying the normalization function. But the example in the doc only demonstrate a scenario where the normalization factors are pre-defined and fixed, like lambda x: (x-3.2)/1.5. How can I perform normalization (e.g. MinMaxScaler in sklearn) across all those features without knowing its maximum and minimum beforehand.

Also, is there any pipeline implementation where it's possible to do all sorts of feature transformations before they go into the input layer? Is creating a custom estimator tf.estimator.Estimator the answer to this problem? or anything else I'm not aware of.

Upvotes: 0

Views: 170

Answers (1)

hexpheus
hexpheus

Reputation: 761

I can actually answer a part of your question:

But the example in the doc only demonstrate a scenario where the normalization factors are pre-defined and fixed, like lambda x: (x-3.2)/1.5.

you can simply use .min and .max class members of a Pandas dataframe to obtain the minimum and maximum of the desired array. Let's say you want to normalize some columns in diabetes dataset, you can do the following:

diabetes = pd.read_csv('pima-indians-diabetes.csv', names=new_cols)

# Normalize the columns
cols_to_norm = ['Number_pregnant',
                'Glucose_concentration',
                'Blood_pressure',
                'Triceps',
                'Insulin',
                'BMI',
                'Pedigree']

diabetes[cols_to_norm] = diabetes[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))

Upvotes: 1

Related Questions