Reputation: 3472
I have a data sequence a
which is of shape [seq_len, 2]
, seq_len
is the length of the sequence. There is time correlation among elements of a[:, 0]
and a[:, 1]
, but a[:, 0]
and a[:, 1]
are independent of each other. For training I prepare data of shape [batch_size, seq_len, 2]
. The initialization of BRNN that I use is
birnn_layer = nn.RNN(input_size=2, hidden_size=100, batch_first=True, bidirectional=True)
From the docs,
input_size – The number of expected features in the input x
hidden_size – The number of features in the hidden state h
What does "number of expected features" mean? Since there is correlation along the seq_len
axis should my input_size
be set as seq_len
and the input be permuted? Thanks.
Upvotes: 3
Views: 567
Reputation: 4146
Is the whole pair [c, d]
time independent from [a, b]
? If there is a dependency, that is, the data follows a concrete logic that [c, d]
must occur at a later time or instance than [a, b]
, then the sequence formulation of [[a,b], [c,d], ...]
is correct. For your data of the form batch_size x seq_len x 2
, this 2
would be the "The number of expected features" or the input_size
.
However, you've also said that [a, c, e]
and [b, d, f]
are results of independent processes. So naturally, they can also be separable as two independent sequences of the form batch_size x seq_len x 1
. You could pass these two sequences through two separate BRNN layers, and then combine the resulting features -- by concatenation along the feature dimension, or by taking sum, average, maximum etc. A bit of light reading on the topic of multimodal fusion in deep learning may be helpful in this regard.
Upvotes: 1
Reputation: 2011
The question is how, if at all, your data contributes to the overall optimization problem. You said that elements of a[:, 0]
are time-correlated and elements of a[:, 1]
are time-correlated. Are a[i, 0]
and a[i, 1]
time-correlated? Does it makes sense for both sequences to be set together?
If, for example, you are trying to predict whether certain electrical machine is going to malfunction based on sequences of voltage applied to the machine a[:, 0]
and humidity in the room a[:, 1]
over time plus these signals were collected at the same time it is ok. But should they were collected in different time, does it makes sense? Or if you would have measured something different than humidity, would it help you predict malfunction?
number of expected features means number of features in a single timestamp so to speak. So, going along with my previous analogy, how many signals (voltage, humidity.. ) I measure simultanously.
Of course this is only an example, you do not have to have classification-over-time problem, it can be anything else. The bottom point is how your RNN and data work together.
Upvotes: 1