Reputation: 105
The neural network applications I've seen always learn the weights of their inputs and use fixed "hidden layers".
But I'm wondering about the following techniques:
1) fixed inputs, but the hidden layers are no longer fixed, in the sense that the functions of the input they compute can be tweaked (learned)
2) fixed inputs, but the hidden layers are no longer fixed, in the sense that although they have clusters which compute fixed functions (multiplication, addition, etc... just like ALUs in a CPU or GPU) of their inputs, the weights of the connections between them and between them and the input can be learned (this should in some ways be equivalent to 1) )
These could be used to model systems for which we know the inputs and the output but not how the input is turned into the output (figuring out what is inside a "black box"). Do such techniques exist and if so, what are they called?
Upvotes: 1
Views: 330
Reputation: 7592
For part (1) of your question, there are a couple of relatively recent techniques that come to mind.
The first one is a type of feedforward layer called "maxout" which computes a piecewise linear output function of its inputs.
Consider a traditional neural network unit with d
inputs and a linear transfer function. We can describe the output of this unit as a function of its input z
(a vector with d
elements) as g(z) = w z
, where w
is a vector with d
weight values.
In a maxout unit, the output of the unit is described as
g(z) = max_k w_k z
where w_k
is a vector with d
weight values, and there are k
such weight vectors [w_1 ... w_k]
per unit. Each of the weight vectors in the maxout unit computes some linear function of the input, and the max
combines all of these linear functions into a single, convex, piecewise linear function. The individual weight vectors can be learned by the network, so that in effect each linear transform learns to model a specific part of the input (z
) space.
You can read more about maxout networks at http://arxiv.org/abs/1302.4389.
The second technique that has recently been developed is the "parametric relu" unit. In this type of unit, all neurons in a network layer compute an output g(z) = max(0, w z) + a min(w z, 0)
, as compared to the more traditional rectified linear unit, which computes g(z) = max(0, w z)
. The parameter a
is shared across all neurons in a layer in the network and is learned along with the weight vector w
.
The prelu technique is described by http://arxiv.org/abs/1502.01852.
Maxout units have been shown to work well for a number of image classification tasks, particularly when combined with dropout to prevent overtraining. It's unclear whether the parametric relu units are extremely useful in modeling images, but the prelu paper gets really great results on what has for a while been considered the benchmark task in image classification.
Upvotes: 2