Reputation: 1371
I would like to understand what exactly is going on with this argument.
I have read that the feed forward sub-layer inside the transformer layer is a "pointwise" feed-forward layer. what does "pointwise" means in this context?
feed-forward layers takes 2 args: input features and output features. this argument can't be the output features since no matter what value I use for it the output of the transformer layer always has the same shape. it also can't be the input features since it is determined by the self attention sublayer.
MOST IMPORTANTLY - where is the argument for the size of the tensors for the attention? the ones that translate the input into queries, keys and values?
Upvotes: 2
Views: 3870
Reputation: 66
# Implementation of Feedforward model
self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)
self.dropout = Dropout(dropout)
self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)
So dim_feedforward
is the feature no. of hidden layer of the FFN. Usually, its value is set to be several times larger than d_model
(2048 as default).
Upvotes: 5