Lukas Hestermeyer
Lukas Hestermeyer

Reputation: 1037

Tensorflow dataset API for time series classification

I am getting used to the new dataset API and try to do some time series classification. I have a dataset formatted as tf-records in the shape of: (time_steps x features). Also I have a label for each time step. (time_steps x 1)

What I want to do is to reformat the dataset to have a rolling window of time steps like this: (n x windows_size x features). With n being the amounts of time_steps-window_size (if I use a stride of 1 for the rolling window)

The labels are supposed to be (window_size x 1), meaning that we take the label of the last time_step in the window.

I already know, that I can use tf.sliding_window_batch() to create the sliding window for the features. However, the labels get shaped in the same way, and I do not know how to do this correctly: (n x window_size x 1

How do I do this using the tensorflow dataset API? https://www.tensorflow.org/programmers_guide/datasets

Thanks for your help!

Upvotes: 0

Views: 609

Answers (2)

Napoléon
Napoléon

Reputation: 321

I have a slow solution with TF 1.13.

    WIN_SIZE= 5000

dataset_input = tf.data.Dataset.from_tensor_slices(data1).window(size= WIN_SIZE,
                                                             shift= WIN_SIZE,
                                                             drop_remainder= False).flat_map(lambda x: 
                                                                                            x.batch(WIN_SIZE))

dataset_label = tf.data.Dataset.from_tensor_slices(data2).window(size= WIN_SIZE,
                                                             shift= WIN_SIZE,
                                                             drop_remainder= False).flat_map(lambda x: 
                                                                                            x.batch(WIN_SIZE)).map(lambda x:
                                                                                                                  x[-1])
dataset= tf.data.Dataset.zip((dataset_input, dataset_label))
dataset= dataset.repeat(1)
data_iter = dataset.make_one_shot_iterator() # create the iterator
next_sample= data_iter.get_next()

with tf.Session() as sess:
    i=0
    while True:
        try:
            r_= sess.run(next_sample)
            i+=1
            print(i)
            print(r_)
            print(r_[0].shape)
            print(r_[1].shape)

        except tf.errors.OutOfRangeError:
            print('end')
            break

The reason why I say 'slow solution' is maybe the below code snippet can be optimized which I do not finish yet:

dataset_label = tf.data.Dataset.from_tensor_slices(data2).window(size= WIN_SIZE,
                                                         shift= WIN_SIZE,
                                                         drop_remainder= False).flat_map(lambda x: 
                                                                                        x.batch(WIN_SIZE)).map(lambda x:
                                                                                                              x[-1])

A promising solution maybe finds some 'skip' operation to skip useless values in dataset_label rather than use 'window' operation (now is).

Upvotes: 1

Lukas Hestermeyer
Lukas Hestermeyer

Reputation: 1037

I couldn't figure out how to do this, but I figured I might as well do it using numpy.

I found this great answer and applied it to my case.

Afterwards it was just using numpy like so:

train_df2 = window_nd(train_df, 50, steps=1, axis=0)
train_features = train_df2[:,:,:-1]
train_labels = train_df2[:,:,-1:].squeeze()[:,-1:]
train_labels.shape

My label was the last column, so you might have to adjust this a bit.

Upvotes: 1

Related Questions