Dealing with batch size and time step in 1D CNN

Question

I have a batch generator which gives me data in the shape of (500, 1, 12) (i.e. corresponding to (batch size, time steps, features)).

def batch_generator(batch_size, gen_x,gen_y): 
    batch_features = np.zeros((batch_size,1, 12))
    batch_labels = np.zeros((batch_size,9))
    while True:
        for i in range(batch_size):
            batch_features[i] = next(gen_x)
            batch_labels[i] = next(gen_y)
        yield batch_features, batch_labels

def generate_X():
    while True:
        with open("/my_path/my_data.csv") as f:
            for line in f:
                currentline = line.rstrip('
').split(",")
                currentline = np.asarray(currentline)
                currentline = currentline.reshape(1,1,12)
                yield currentline

def generate_y():
    while True:
        for i in range(len(y_train)):
            y= y_train[i]
            yield y

I then try to feed this into a 1D-CNN:

model = Sequential()
model.add(Conv1D(filters=100, kernel_size=1, activation='relu', input_shape=(1,12), data_format="channels_last"))

But now I am not able to use a kernel size of more than 1 (i.e. kernel_size = 1). This is probably because my time step is equal to 1.

How can I use the whole batch size as input to the 1D-CNN and increase the kernel_size?

today · Accepted Answer

Keep in mind that 1D-convolution is used when each of our input samples is a sequence, i.e. data in which the order of values are important/specified, like stock market values over a week or the weather temperature values over a period of month or a sequence of genomes or words. With that said, considering your data, there are three different scenarios:

If each line in your csv file is a sequence of length 12, then you are dealing with samples of shape (12,1), i.e. in each sample there are 12 timesteps where each timestep has only on feature. So you should reshape it accordingly (i.e. to (12,1) and not to (1,12)).
However, if each line is not a sequence by itself, but a group of consecutive lines form a sequence, then you must generate your data accordingly: each sample would consists of multiple consecutive lines, e.g. if we consider the number of timesteps to be 10 then lines #1 to #10 would be a sample, lines #2 to #12 would be another sample, and so on. And in this case each sample would have a shape of (number_of_timesteps, 12) (in the example I mentioned it would be (10,12)). Now you can create and generate these samples by writing a custom function, or alternatively you could load all of the data as a numpy array and then use TimeseriesGenerator to do it for you.
If none of the two cases above apply, then it's very likely that your data is not a sequential at all and therefore using 1D-CNN (or any other sequence processing model like RNNs) does not make sense for this data. Instead, you should use other suitable architectures.

Dealing with batch size and time step in 1D CNN

Answers (1)

Related Questions