K.Alan
K.Alan

Reputation: 200

how to input data in tensorflow?

I have 4 files:train.txt,trainLabel.txt,test.txt,testLabel.txt

train.txt

1,60,feature_col0,feature_col1,feature_col2,feature_col3,feature_col4,feature_col5,feature_col6,feature_col7,feature_col8,feature_col9,feature_col10,feature_col11,feature_col12,feature_col13,feature_col14,feature_col15,feature_col16,feature_col17,feature_col18,feature_col19,feature_col20,feature_col21,feature_col22,feature_col23,feature_col24,feature_col25,feature_col26,feature_col27,feature_col28,feature_col29,feature_col30,feature_col31,feature_col32,feature_col33,feature_col34,feature_col35,feature_col36,feature_col37,feature_col38,feature_col39,feature_col40,feature_col41,feature_col42,feature_col43,feature_col44,feature_col45,feature_col46,feature_col47,feature_col48,feature_col49,feature_col50,feature_col51,feature_col52,feature_col53,feature_col54,feature_col55,feature_col56,feature_col57,feature_col58,feature_col59
1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1

trainLabel.txt

1,4,feature_col0,feature_col1,feature_col2,feature_col3
1,1,1,0

test.txt

1,60,feature_col0,feature_col1,feature_col2,feature_col3,feature_col4,feature_col5,feature_col6,feature_col7,feature_col8,feature_col9,feature_col10,feature_col11,feature_col12,feature_col13,feature_col14,feature_col15,feature_col16,feature_col17,feature_col18,feature_col19,feature_col20,feature_col21,feature_col22,feature_col23,feature_col24,feature_col25,feature_col26,feature_col27,feature_col28,feature_col29,feature_col30,feature_col31,feature_col32,feature_col33,feature_col34,feature_col35,feature_col36,feature_col37,feature_col38,feature_col39,feature_col40,feature_col41,feature_col42,feature_col43,feature_col44,feature_col45,feature_col46,feature_col47,feature_col48,feature_col49,feature_col50,feature_col51,feature_col52,feature_col53,feature_col54,feature_col55,feature_col56,feature_col57,feature_col58,feature_col59
0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1

testLabel.txt

1,4,feature_col0,feature_col1,feature_col2,feature_col3
1,1,0,0

dpNum means feature_col

I want to input some data like train.txt

[1 ,0..........., 1] # a rank 1 tensor; this is a vector with shape [60],

And predict

[1,0,0,1] # a rank 1 tensor; this is a vector with shape [4]

Upvotes: 0

Views: 902

Answers (1)

kafman
kafman

Reputation: 2860

From the tutorials page:

# Fit model.
classifier.fit(x=training_set.data,
               y=training_set.target,
               steps=2000)

I.e. you can access the targets by calling training_set.target, this should give you the label for each data point.

Also, I am not sure if you got confused with some terminology: You say that the training dataset has 15'000 data points, but only 1'000 labels, which (at least for the Iris dataset) does not make much sense as I believe that the whole dataset is labeled. Did you mean to say that you have 15'000 training samples and 1'000 test samples?

So, not sure if all of the following is already clear to you, but if not, hopefully it clears things up for you. Say the Iris dataset looks something like this (taken from Wikipedia):

Sepal length    Sepal width     Petal length    Petal width     Species
5.1             3.5             1.4             0.2             I. setosa
4.9             3.0             1.4             0.2             I. setosa
4.7             3.2             1.3             0.2             I. setosa
....
5.1             2.5             3.0             1.1             I. versicolor
5.7             2.8             4.1             1.3             I. versicolor

Now usually the following terminologies are used:

  • Each row in the table is a data point or sample
  • The dimensionality of a data point is 4 in this case (the 4 features sepal length, sepal width, petal length and petal width)
  • The label or target is the last column in the above table (I. setosa or I. versicolor). Usually, labels are encoded somehow, e.g. the label is 0 for I. setosa and 1 otherwise as you hint towards in your question. There could be more than just those two possible labels, though. E.g. in the Iris dataset there is usually also a third flower called I. virginica.
  • The training and the test-set look exactly the same, except that the test set is usually smaller (and you don't make use of the test set's labels other than to evaluate the score of the final output of your classifier).

Upvotes: 1

Related Questions