Reputation: 23
I have a tf dataset called train_ds:
directory = 'Data/dataset_train'
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
directory,
validation_split=0.2,
subset="training",
color_mode='grayscale',
seed=123,
image_size=(28, 28),
batch_size=32)
This dataset is composed of 20000 images of "Fake" images and 20000 "Real" images and I want to extract X_train and y_train in numpy form from this tf dataset but I have only managed to get the labels out with
y_train = np.concatenate([y for x, y in train_ds], axis=0)
I also tried with this but it doesn't seem like it's iterating through the 20000 images:
for images, labels in train_ds.take(-1):
X_train = images.numpy()
y_train = labels.numpy()
I really want to extract the images to X_train and the labels to y_train but I can't figure it out! I apologize in advance for any mistake I've made and appreciate all the help I can get :)
Upvotes: 2
Views: 2110
Reputation: 396
You can use TF Dataset method unbatch() to unbatch the dataset, then you can easily retrieve the data and the labels from it:
data=[]
for images, labels in ds.unbatch():
data.append(images)
Upvotes: 1
Reputation: 5079
If you did not apply further transformations to the dataset it will be a BatchDataset
. You can create two lists to iterate over dataset. Here in total I have 2936 images.
x_train, y_train = [], []
for images, labels in train_ds:
x_train.append(images.numpy())
y_train.append(labels.numpy())
np.array(x_train).shape >> (92,)
It was generating batches. You can use np.concatenate
to concat them.
x_train = np.concatenate(x_train, axis = 0)
x_train.shape >> (2936,28,28,3)
Or you can unbatch the dataset and iterate over it:
for images, labels in train_ds.unbatch():
x_train.append(images.numpy())
y_train.append(labels.numpy())
x_train = np.array(x_train)
x_train.shape >> (2936,28,28,3)
Upvotes: 2