Ling
Ling

Reputation: 465

TensorFlow: Error using weighted_categorical_column

I work on a binary classification problem containing a field STREET. In a first step I used the Tokenization to the get a word list (frequency of how often one word appears in the different datasets). Then I used this information to create two columns in my Dataframe describing the word and how often it was used:

def buildWeightList(indexes, tokenizer):
    weights = []
    for index in indexes:
        if index == 0:
            weights.append(0)
        else:
            weights.append(tokenizer.index_docs.get(index))
    return weights
street_tokenized = ts.texts_to_sequences(data['STREETPRO'])
data['STREETPRO'] = tf.keras.preprocessing.sequence.pad_sequences(street_tokenized, maxlen=1)
data['STREETFREQ']  = buildWeightList(data['STREETPRO'], ts)

After I converted the Dataframe to a TensorFlow Dataset I have used the following code to add it to my future columns:

vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()
street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
    key='STREETPRO', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)

weighted_street = tf.feature_column.weighted_categorical_column(categorical_column=street_voc, weight_feature_key='STREETFREQ', dtype=tf.dtypes.int64)
street_one_hot = feature_column.indicator_column(weighted_street)

feature_columns.append(street_one_hot)

As you can see I used the function tf.feature_column.weighted_categorical_column. Unfortunately I get the following error when I try to train my model:

InvalidArgumentError:  indices and values rows (indexing dimension) must match. (indices = 5, values = 1)
     [[node sequential/dense_features_2/STREETPRO_weighted_by_STREETFREQ_indicator/SparseMerge/SparseReorder (defined at <ipython-input-40-964101dd1dc8>:3) ]] [Op:__inference_train_function_986]

Furthermore I get the following warning:

WARNING:tensorflow:From ...\feature_column\feature_column_v2.py:4366: sparse_merge (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.

Now I have two questions:

First: does it make sense to use this function for my described problem? Unfortunately, I couldn’t find a detailed description how this function works (only this short documentations: https://www.tensorflow.org/api_docs/python/tf/feature_column/weighted_categorical_column)

Second: How can I fix the described error?

Upvotes: 0

Views: 220

Answers (1)

John Ryan
John Ryan

Reputation: 11

Chiming in a year later to report that I had the same problem and "solved" it by eliminating any examples with zero weights. Might be an issue with Tensorflow converting something to a Sparse representation and omitting the zeros.

Upvotes: 1

Related Questions