Reputation: 1037
I am getting used to the new dataset API and try to do some time series classification. I have a dataset formatted as tf-records in the shape of:
(time_steps x features)
. Also I have a label for each time step.
(time_steps x 1)
What I want to do is to reformat the dataset to have a rolling window of time steps like this:
(n x windows_size x features)
. With n
being the amounts of time_steps-window_size (if I use a stride of 1 for the rolling window)
The labels are supposed to be
(window_size x 1)
, meaning that we take the label of the last time_step in the window.
I already know, that I can use tf.sliding_window_batch()
to create the sliding window for the features. However, the labels get shaped in the same way, and I do not know how to do this correctly: (n x window_size x 1
How do I do this using the tensorflow dataset API? https://www.tensorflow.org/programmers_guide/datasets
Thanks for your help!
Upvotes: 0
Views: 609
Reputation: 321
I have a slow solution with TF 1.13.
WIN_SIZE= 5000
dataset_input = tf.data.Dataset.from_tensor_slices(data1).window(size= WIN_SIZE,
shift= WIN_SIZE,
drop_remainder= False).flat_map(lambda x:
x.batch(WIN_SIZE))
dataset_label = tf.data.Dataset.from_tensor_slices(data2).window(size= WIN_SIZE,
shift= WIN_SIZE,
drop_remainder= False).flat_map(lambda x:
x.batch(WIN_SIZE)).map(lambda x:
x[-1])
dataset= tf.data.Dataset.zip((dataset_input, dataset_label))
dataset= dataset.repeat(1)
data_iter = dataset.make_one_shot_iterator() # create the iterator
next_sample= data_iter.get_next()
with tf.Session() as sess:
i=0
while True:
try:
r_= sess.run(next_sample)
i+=1
print(i)
print(r_)
print(r_[0].shape)
print(r_[1].shape)
except tf.errors.OutOfRangeError:
print('end')
break
The reason why I say 'slow solution' is maybe the below code snippet can be optimized which I do not finish yet:
dataset_label = tf.data.Dataset.from_tensor_slices(data2).window(size= WIN_SIZE,
shift= WIN_SIZE,
drop_remainder= False).flat_map(lambda x:
x.batch(WIN_SIZE)).map(lambda x:
x[-1])
A promising solution maybe finds some 'skip' operation to skip useless values in dataset_label rather than use 'window' operation (now is).
Upvotes: 1
Reputation: 1037
I couldn't figure out how to do this, but I figured I might as well do it using numpy.
I found this great answer and applied it to my case.
Afterwards it was just using numpy like so:
train_df2 = window_nd(train_df, 50, steps=1, axis=0)
train_features = train_df2[:,:,:-1]
train_labels = train_df2[:,:,-1:].squeeze()[:,-1:]
train_labels.shape
My label was the last column, so you might have to adjust this a bit.
Upvotes: 1