P-Gn
P-Gn

Reputation: 24621

How to get the dataset size of a Caffe net in python?

I look at the python example for Lenet and see that the number of iterations needed to run over the entire MNIST test dataset is hard-coded. However, can this value be not hard-coded at all? How to get the number of samples of the dataset pointed by a network in python?

Upvotes: 2

Views: 1280

Answers (2)

Shai
Shai

Reputation: 114866

You can use the lmdb library to access the lmdb directly

import lmdb
db = lmdb.open('/path/to/lmdb_folder')       //Needs lmdb - method
num_examples = int( db.stat()['entries'] )

Should do the trick for you.

Upvotes: 3

avtomaton
avtomaton

Reputation: 4894

It seems that you mixed iterations and amount of samples in one question. In the provided example we can see only number of iterations, i. e. how many times training phase will be repeated. The is no any direct relationship between amount of iterations (network training parameters) and amount of samples in dataset (network input).

Some more detailed explanation:

EDIT: Caffe will totally load (batch size x iterations) samples for training or testing, but there is no relation with amount of loaded samples and actual database size: it will start reading from the beginning after reaching database last record - it other words, database in caffe acts like a circular buffer.

Mentioned example points to this configuration. We can see that it expects lmdb input, and sets batch size to 64 (some more info about batches and BLOBs) for training phase and 100 for testing phase. Really we don't make any assumption about input dataset size, i. e. number of samples in dataset: batch size is only processing chunk size, iterations is how many batches caffe will take. It won't stop after reaching database end.

In other words, network itself (i. e. protobuf config files) doesn't point to any number of samples in database - only to dataset name and format and desired amount of samples. There is no way to determine database size with caffe at the current moment, as I know.

Thus if you want to load entire dataset for testing, you have only option to firstly determine amount of samples in mnist_test_lmdb or mnist_train_lmdb manually, and then specify corresponding values for batch size and iterations.

You have some options for this:

  1. Look at ./examples/mnist/create_mnist.sh console output - it prints amount of samples while converting from initial format (I believe that you followed this tutorial);
  2. follow @Shai's advice (read lmdb file directly).

Upvotes: 1

Related Questions