Reputation: 545
I am fitting Linear Classifier for pretty wide and sparse data using number of Categorical Columns with hash bucket and Crossed Feature Columns as Feature Columns.
Later I want to use the weights/coefficients of the model in a custom serving infrastructure. I know how to extract the weights from the model, but obviously, for aforementioned columns, they come for an already hashed feature values.
I can reconstruct a Hashtable (value -> hashed value) for a simple categorical columns using tf.string_to_hash_bucket_fast, but I am getting trouble doing that for Crossed Feature Columns.
For a pair of values of two categorical columns building up a Crossed Column - how can I understand which bucket they will get into?
Upvotes: 4
Views: 2314
Reputation: 545
After inspecting the source code I found out that the simplest way would be to construct an Input Layer for input data consisting of the all the distinct values (or their combinations) in the column.
As a result you get a DenseTensor consisting of 0 and 1, each row corresponds to a distinct value and where 1s are sitting in the columns corresponding to the actual hash bucket number (I've verified that for Categorical Columns, should be the same for CrossedColumns).
Here is the example code (for both Categorical Column and Crossed Column):
import tensorflow as tf
from tensorflow.python.feature_column import feature_column as fc
actual_sex = {'sex': tf.Variable(['male', 'female', 'female', 'male'], tf.string)}
actual_nationality = {'nationality': tf.Variable(['belgian', 'french', 'belgian', 'belgian'], tf.string)}
actual_sex_nationality = dict(actual_sex, **actual_nationality)
# hashed_column
sex_hashed_raw = fc.categorical_column_with_hash_bucket("sex", 10)
sex_hashed = fc.indicator_column(sex_hashed_raw)
# crossed column
crossed_sn_raw = fc.crossed_column(['sex', 'nationality'], hash_bucket_size = 20)
crossed_sn = fc.indicator_column(crossed_sn_raw)
layer_s = tf.feature_column.input_layer(actual_sex_nationality, sex_hashed)
layer_sn = tf.feature_column.input_layer(actual_sex_nationality, crossed_sn)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(layer_s))
print(sess.run(layer_sn))
Upvotes: 3