Reputation: 521
After taking bunch of online courses and reading many papers I started playing with neural-net but to my surprise it fails to generalize a simple bitwise AND operation.
Inputs:
Inp#1 - randomly generated number between 0-15, scaled down to (0,1)
Inp#2 - 16 bit randomly generated unsigned int scaled down to (0,1)
# Code snippet
int in1 = (int)rand()%16;
int in2 = (int)rand()%(0x0010000);
in[0] = (fann_type)(in1/100.0); // not to worry about float roundup
in[1] = (fann_type)(in2/100000.0); // not to worry about float roundup
Outputs:
Out#1 = -1 if the corresponding bit specified by index inp#1 in inp#2 value is 0, otherwise 1
# Code snippet
int out1 = (in2 & (1<<in1)) ? 1 : -1;
out[0] = (fann_type)out1;
Network: tried many different variations, below is example
A. 1 hidden layer with 30 neurons,
Activation Function (hidden): sigmoid,
Activation Function (output): sigmoid_symmetric (tanh),
Training method: RPROP
Learning rate: 0.7 (default)
Momentum: 0.0 (default)
RPROP Increase factor: 1.2 (default)
RPROP Decrease factor: 0.5 (default)
RPROP Minimum Step-size: 0 (default)
RPROP Maximum Step-size: 50 (default)
B. 3 hidden layers each having 30 neurons, with the same params as in A
C. tried the same networks also with scaling inputs to (-1,1) and using tanh for also hidden layer.
Data Sets: 5000 samples for training, 5000 for testing and 5000 for validation. Tried even bigger datasets, no success
# examples from training set
0.040000 0.321600
-1
0.140000 0.625890
1
0.140000 0.039210
-1
0.010000 0.432830
1
0.100000 0.102220
1
Process: the network trained with training set and monitored the MSE of test data in parallel to avoid possible overfitting.
Libraries: used multiple, but mostly tried with fann and used fanntool for gui.
Any ideas? Can upload the datasets if any particular interest.
Upvotes: 2
Views: 420
Reputation: 66805
If I understand your setup, you try to do something like:
If this is true, this is extremely peculiar problem, and a very bad choice of architecture. Neural networks are not magical hats, they are very big family of models. What you try to do has no characteristics, which is expected from function to model by NN. It is completely non smooth in the input, it has lots of discontinuities, it is actually a bunch of if-else clauses.
What you should do? You should express your inputs as bits, thus you should have 32 inputs, 16 binary inputs per number, then it will learn your function without any problems. You encoded inputs in a very specific manner (by taking its decimal representation) and expect your network to model decomposition to binary and then operation on top of it. NN will learn it, but you might need quite complex network to achieve such operation - again, the whole reason is the fact that you provided your network with suboptimal representation and build a very simple network, which was originally designed to approximate smooth functions.
Upvotes: 2