Neergaard
Neergaard

Reputation: 454

Splitting data in training/validation in Tensorflow CIFAR-10 tutorial

I am confused to how I should implement validation in the CIFAR-10 TensorFlow tutorial.

I am running the CIFAR-10 model located at https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10.

Let's assume that I have a bunch of files that I want to both shuffle and also split into training and validation data for each epoch of training (by epoch I mean one round through the entire dataset, training and validation).

That is, I would then run the training, and after the training has completed, I would run the validation, and after that, I would reshuffle the data files and split into a new training and validation sets.

I suspect that the way to do it probably involves the _LoggerHook object:

class _LoggerHook(tf.train.SessionRunHook):
    """Logs loss and runtime."""

    def begin(self):
        self._step = -1
        self._start_time = time.time()

    def before_run(self, run_context):
        self._step += 1
        return tf.train.SessionRunArgs(loss)  # Asks for loss value.

    def after_run(self, run_context, run_values):
        if self._step % FLAGS.log_frequency == 0:
            current_time = time.time()
            duration = current_time - self._start_time
            self._start_time = current_time

            loss_value = run_values.results
            examples_per_sec = FLAGS.log_frequency * FLAGS.batch_size / duration
            sec_per_batch = float(duration / FLAGS.log_frequency)

            format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
                              'sec/batch)')
            print(format_str % (datetime.now(), self._step, loss_value,
                                    examples_per_sec, sec_per_batch))

Since this is already keeping track of the steps, but how do I deliver the correct queue of files?

Any help or pointers in the right direction would be awesome.

Upvotes: 1

Views: 3947

Answers (1)

Aditya
Aditya

Reputation: 2520

Something like following should work:

tf.split_v(tf.random_shuffle(...

Or try this one (my favourite). The model_selection method train_test_split is specifically designed to split your data into train and test sets randomly and by percentage.

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.33, random_state=42)

Upvotes: 4

Related Questions