Madhav Bhattarai
Madhav Bhattarai

Reputation: 867

How to convert numerical categorical data into Sparse tensors in tensorflow?

My dataset format is as shown below:

8,2,1,1,1,0,3,2,6,2,2,2,2
8,2,1,2,0,0,15,2,1,2,2,2,1
5,5,4,4,0,0,6,1,6,2,2,1,2
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,4,0,1,3,2,1,2,2,2,1
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,12,0,0,5,2,2,2,2,2,1
3,1,1,2,0,0,3,2,1,2,2,2,1

It consists of all categorical data, where each feature is coded numerically. I tried with the following code:

        monthly_income = tf.contrib.layers.sparse_column_with_keys("monthly_income", keys=['1','2','3','4','5','6'])
        #Other columns are also declared in the same way

        m = tf.contrib.learn.LinearClassifier(feature_columns=[
        caste, religion, differently_abled, nature_of_activity, school, dropout, qualification,
        computer_literate, monthly_income, smoke,drink,tobacco,sex],
        model_dir=model_dir)

But I am getting the following error:

TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int64'>.

Upvotes: 2

Views: 798

Answers (1)

sygi
sygi

Reputation: 4637

I think the problem is outside the code that you shown. My guess is that the features in csv file were read as ints, but you expect them to be strings, by passing keys=['1', '2', ...].

Nevertheless, in this situation, I recommend you to use sparse_column_with_integerized_feature:

monthly_income = tf.contrib.layers.sparse_column_with_integerized_feature("monthly_income", bucket_size=7)

Upvotes: 5

Related Questions