How do I change the dtype in TensorFlow for a csv file?

Question

Here is the code that I am trying to run-

import tensorflow as tf
import numpy as np
import input_data

filename_queue = tf.train.string_input_producer(["cs-training.csv"])

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

record_defaults = [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11 = tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.concat(0, [col2, col3, col4, col5, col6, col7, col8, col9, col10, col11])

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(1200):
    # Retrieve a single instance:
    print i
    example, label = sess.run([features, col1])
    try:
        print example, label
    except:
        pass

  coord.request_stop()
  coord.join(threads)

This code return the error below.

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
 in ()
      7     # Retrieve a single instance:
      8     print i
----> 9     example, label = sess.run([features, col1])
     10     try:
     11         print example, label

/root/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict)
    343 
    344     # Run request and get response.
--> 345     results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
    346 
    347     # User may have fetched the same tensor multiple times, but we

/root/anaconda/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, target_list, fetch_list, feed_dict)
    417         # pylint: disable=protected-access
    418         raise errors._make_specific_exception(node_def, op, e.error_message,
--> 419                                               e.code)
    420         # pylint: enable=protected-access
    421       raise e_type, e_value, e_traceback

InvalidArgumentError: Field 1 in record 0 is not a valid int32: 0.766126609

It the has a lot of information following it which I think is irrelevant to the problem. Obviously the problem is that a lot of the data that I am feeding to the program is not of the dtype int32. It's mostly float numbers. I've tried a few things to change the dtype like explicitly setting the dtype=float argument in tf.decode_csv as well as the tf.concat. Neither worked. It's an invalid argument. To top it all off, I don't know if this code is going to actually make a prediction on the data. I want it to predict whether col1 is going to be a 1 or a 0 and I don't see anything in the code that would hint that it's going to actually make that prediction. Maybe I'll save that question for a different thread. Any help is greatly appreciated!

Ravaal · Accepted Answer

The answer to changing the dtype is to just change the defaults like so-

record_defaults = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]]

After you do that, if you print out col1, you'll receive this message.

Tensor("DecodeCSV_43:0", shape=TensorShape([]), dtype=float32)

But there is another error that you will run into, which has been answered here. To recap the answer, the workaround is to change tf.concat to tf.pack like so.

features = tf.pack([col2, col3, col4, col5, col6, col7, col8, col9, col10, col11])

How do I change the dtype in TensorFlow for a csv file?

Answers (2)

Related Questions