Reputation: 465
I work on a binary classification problem containing a field STREET. In a first step I used the Tokenization to the get a word list (frequency of how often one word appears in the different datasets). Then I used this information to create two columns in my Dataframe describing the word and how often it was used:
def buildWeightList(indexes, tokenizer):
weights = []
for index in indexes:
if index == 0:
weights.append(0)
else:
weights.append(tokenizer.index_docs.get(index))
return weights
street_tokenized = ts.texts_to_sequences(data['STREETPRO'])
data['STREETPRO'] = tf.keras.preprocessing.sequence.pad_sequences(street_tokenized, maxlen=1)
data['STREETFREQ'] = buildWeightList(data['STREETPRO'], ts)
After I converted the Dataframe to a TensorFlow Dataset I have used the following code to add it to my future columns:
vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()
street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
key='STREETPRO', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
weighted_street = tf.feature_column.weighted_categorical_column(categorical_column=street_voc, weight_feature_key='STREETFREQ', dtype=tf.dtypes.int64)
street_one_hot = feature_column.indicator_column(weighted_street)
feature_columns.append(street_one_hot)
As you can see I used the function tf.feature_column.weighted_categorical_column. Unfortunately I get the following error when I try to train my model:
InvalidArgumentError: indices and values rows (indexing dimension) must match. (indices = 5, values = 1)
[[node sequential/dense_features_2/STREETPRO_weighted_by_STREETFREQ_indicator/SparseMerge/SparseReorder (defined at <ipython-input-40-964101dd1dc8>:3) ]] [Op:__inference_train_function_986]
Furthermore I get the following warning:
WARNING:tensorflow:From ...\feature_column\feature_column_v2.py:4366: sparse_merge (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Now I have two questions:
First: does it make sense to use this function for my described problem? Unfortunately, I couldn’t find a detailed description how this function works (only this short documentations: https://www.tensorflow.org/api_docs/python/tf/feature_column/weighted_categorical_column)
Second: How can I fix the described error?
Upvotes: 0
Views: 220
Reputation: 11
Chiming in a year later to report that I had the same problem and "solved" it by eliminating any examples with zero weights. Might be an issue with Tensorflow converting something to a Sparse representation and omitting the zeros.
Upvotes: 1