Reputation: 1137
By this piece of code from the documentation, we can create multiple features to feed batches of data into a DNN model:
my_feature_columns = []
for key in train_x.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
But the problem is what is the proper way to transform the original features before they are fed to the input layer? Typical transformations that I can think of include normalization and clipping.
tf.feature_column.numeric_column
does have a parameter specifying the normalization function. But the example in the doc only demonstrate a scenario where the normalization factors are pre-defined and fixed, like lambda x: (x-3.2)/1.5
. How can I perform normalization (e.g. MinMaxScaler
in sklearn) across all those features without knowing its maximum and minimum beforehand.
Also, is there any pipeline implementation where it's possible to do all sorts of feature transformations before they go into the input layer? Is creating a custom estimator tf.estimator.Estimator
the answer to this problem? or anything else I'm not aware of.
Upvotes: 0
Views: 170
Reputation: 761
I can actually answer a part of your question:
But the example in the doc only demonstrate a scenario where the normalization factors are pre-defined and fixed, like lambda x: (x-3.2)/1.5.
you can simply use .min and .max class members of a Pandas dataframe to obtain the minimum and maximum of the desired array. Let's say you want to normalize some columns in diabetes dataset, you can do the following:
diabetes = pd.read_csv('pima-indians-diabetes.csv', names=new_cols)
# Normalize the columns
cols_to_norm = ['Number_pregnant',
'Glucose_concentration',
'Blood_pressure',
'Triceps',
'Insulin',
'BMI',
'Pedigree']
diabetes[cols_to_norm] = diabetes[cols_to_norm].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
Upvotes: 1