Dataset.from_generator cannot replicate functionality of numpy arrays as input to 1D Convnet

Question

I am feeding many time series of length 100 and 3 features into a 1D Convnet. I have too many of these to use numpy arrays, therefore I need to use Dataset.from_generator().

The problem is that when I train the model on the dataset, it gives the error:

expected conv1d_input to have 3 dimensions, but got array with shape (100, 3)

The code below demonstrates the problem. The generator produces each element as an expected (100,3) array. Why does the model not recognise the generator output as valid?

Many thanks for any help. Julian

import numpy as np
import tensorflow as tf
def create_timeseries_element():
    # returns a random time series of 100 intervals, each with 3 features,
    # and a random one-hot array of 5 entries
    data = np.random.rand(100,3)
    label = np.eye(5, dtype='int')[np.random.choice(5)]
    return data, label

def data_generator():
    d, l = create_timeseries_element()
    yield (d, l)

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv1D(128, 9, activation='relu', input_shape=(100, 3)),
    tf.keras.layers.Conv1D(128, 9, activation='relu'),
    tf.keras.layers.MaxPooling1D(2),
    tf.keras.layers.Conv1D(256, 5, activation='relu'),
    tf.keras.layers.Conv1D(256, 5, activation='relu'),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(5, activation='softmax')])
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

x_train = []
y_train = []
for _ in range(1000):
    d, l = create_timeseries_element()
    x_train.append(d)
    y_train.append(l)
x_train = np.array(x_train)
y_train = np.array(y_train)

# train model with numpy arrays - this works
model.fit(x=x_train, y=y_train)

ds = tf.data.Dataset.from_generator(data_generator, output_types=(tf.float32, tf.int32),
                                      output_shapes=(tf.TensorShape([100, 3]), tf.TensorShape([5])))
# train model with dataset - this fails
model.fit(ds)

Kaushik Roy · Accepted Answer

Model expects a batch/list of samples. You can do that by simply setting batch property while creating your dataset as follows:

ds = tf.data.Dataset.from_generator(data_generator, output_types=(tf.float32, tf.int32),
                                      output_shapes=(tf.TensorShape([100, 3]), tf.TensorShape([5])))
ds = ds.batch(16)

You can also do that another way when you prepare sample. In this way, You need to expand the sample dimension so that a sample acts as a batch (you can pass a list of samples too) and you have to do the following modifications in your output_shapes of dataset and create_timeseries_element function

def create_timeseries_element():
    # returns a random time series of 100 intervals, each with 3 features,
    # and a random one-hot array of 5 entries
    # Expand dimensions to create a batch of single sample
    data = np.expand_dims(np.random.rand(100, 3), axis=0)
    label = np.expand_dims(np.eye(5, dtype='int')[np.random.choice(5)], axis=0)
    return data, label

ds = tf.data.Dataset.from_generator(data_generator, output_types=(tf.float32, tf.int32), output_shapes=(tf.TensorShape([None, 100, 3]), tf.TensorShape([None, 5])))

The above changes will supply only a single batch (sample for first solution) for each epochs of your dataset. You can generate as much batches (samples for first solution) you want (e.g. 25) by passing a parameter to data_generator function while you define your dataset like follows:

def data_generator(count=1):
    for _ in range(count):
        d, l = create_timeseries_element()
        yield (d, l)

ds = tf.data.Dataset.from_generator(data_generator, args=[25], output_types=(tf.float32, tf.int32), output_shapes=(tf.TensorShape([None, 100, 3]), tf.TensorShape([None, 5])))

Dataset.from_generator cannot replicate functionality of numpy arrays as input to 1D Convnet

Answers (1)

Related Questions