Neo
Neo

Reputation: 4478

How to get number of rows, columns /dimensions of tensorflow.data.Dataset?

Like pandas_df.shape is there any way for tensorflow.data.Dataset? Thanks.

Upvotes: 4

Views: 2509

Answers (2)

Elefhead
Elefhead

Reputation: 160

To add to Vlad's answer, just in case someone is trying this out for datasets downloaded via tfds, a possible way is to use the dataset information:

info.features['image'].shape # shape of 1 feature in dataset
info.features['label'].num_classes # number of classes
info.splits['train'].num_examples # number of training examples

Eg. tf_flowers :

import tensorflow as tf
import tensorflow_datasets as tfds 

dataset, info = tfds.load("tf_flowers", with_info=True) # download data with info

image_size = info.features['image'].shape # (None, None, 3)
num_classes = info.features['label'].num_classes # 5
data_size = info.splits['train'].num_examples # 3670

Eg. fashion_mnist :

import tensorflow as tf
import tensorflow_datasets as tfds 

dataset, info = tfds.load("fashion_mnist", with_info=True) # download data with info

image_size = info.features['image'].shape # (28, 28, 1)
num_classes = info.features['label'].num_classes # 10
data_splits = {k:v.num_examples for k,v in info.splits.items()} # {'test': 10000, 'train': 60000}

Hope this helps.

Upvotes: 0

Vlad
Vlad

Reputation: 8595

I'm not familiar with something built-in, but the shapes could be retrieved from Dataset._tensors attribute. Example:

import tensorflow as tf

def dataset_shapes(dataset):
    try:
        return [x.get_shape().as_list() for x in dataset._tensors]
    except TypeError:
        return dataset._tensors.get_shape().as_list()

And usage:

from sklearn.datasets import make_blobs

x_train, y_train = make_blobs(n_samples=10,
                              n_features=2,
                              centers=[[1, 1], [-1, -1]],
                              cluster_std=0.5)
dataset = tf.data.Dataset.from_tensor_slices(x_train)
print(dataset_shapes(dataset)) # [10, 2]

dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
print(dataset_shapes(dataset)) # [[10, 2], [10]]

Upvotes: 1

Related Questions