Reputation: 8154
I am trying to create a Back propagation network in Python. I have 14 input features and 1 output. i am planning to use Multilayer neural network.
I am having the below question
1) What should be the i/p and hidden layer ratio?
I am little confused about the Hidden layer count.
Can any one help me.
Thanks,
UPDATE :
Input and my label
[[235, 2, -16.033171734306542, -828.0208534934904, 232965.81361002076, 2000.0, 11182359.053280996, 8565.232332709325, 4000.0, 0.019363246307941673, 1052153, 11313.47311827957, 105.79752842706958, 94],[10]],
[[-604, -6, 8.086235575302165, 380.8373042348658, 41190.53784866458, 2000.0, 1977145.8167358998, 420.30048579171057, 4000.0, 0.02123278230725872, 3436716, 36953.93548387097, 191.20880866382254, 94],[10]],
[[1825, 19, 14.022865897726179, -713.1319698367766, 97114.42605383566, 2000.0, 4661492.450584112, 1033.7486227812578, 4000.0, -0.019663774014977573, 3648687, 39233.1935483871, 197.01730672439965, 94],[10]],
[[-281, -2, -1.5372950205773066, 454.058413755312, 26895.611774858942, 2000.0, 1290989.3651932292, 765.2497914458995, 4000.0, -0.0033856767631790675, 5459685, 58706.290322580644, 241.00156704708152, 94],[10]],
[[1254, 13, 7.42946537169472, 236.81791472792207, 37351.8426913391, 2000.0, 1792888.4491842769, 923.863841127187, 4000.0, 0.03137205806507656, 5618776, 60416.94623655914, 244.48765360638856, 94],[10]],
[[55, 0, -6.799835826239174, -297.6057130887548, 7874.250847696101, 2000.0, 377964.04068941285, 66.64091494961357, 4000.0, 0.022848472079604405, 4150489, 44628.913978494624, 210.12886117302483, 94],[10]],
[[97, 1, 9.01187671470769, -55.32899089341877, 8218.299323445417, 2000.0, 394478.36752538, 127.66669905739745, 4000.0, -0.16287802414592073, 5331935, 57332.63440860215, 238.16530554628952, 94],[10]],
[[229, 2, 1.9250596458545362, -137.23162431944527, 16672.65593718128, 2000.0, 800287.4849847014, 130.52997477489504, 4000.0, -0.014027813599097374, 6905755, 74255.43010752689, 271.045159933551, 94],[10]],
[[107, 1, 6.470150940664045, 29.918507467688016, 26956.56324395225, 2000.0, 1293915.035709708, 165.12995290667556, 4000.0, 0.21625914820957587, 5269967, 56666.31182795699, 236.77727661962044, 94],[10]],
[[500, 5, 8.286114608469786, 122.0075128161886, 35446.863937609196, 2000.0, 1701449.4690052415, 253.11481415842877, 4000.0, 0.06791478997652628, 4669072, 50205.07526881721, 222.86986948307808, 94],[10]],
[[414, 4, 27.324467984592186, 485.55010485356297, 27500.260236682432, 2000.0, 1320012.4913607568, 214.55557670874316, 4000.0, 0.05627527975271053, 2489806, 26772.107526881722, 162.74918700976798, 94],[10]],
[[1044, 11, 4.238057309288552, -292.40132784218787, 8680.945668556162, 2000.0, 416685.3920906958, 475.7867593841577, 4000.0, -0.014493974225643315, 7271678, 78190.08602150538, 278.1335589168353, 94],[10]],
[[-528, -5, -10.252042152315722, 129.48476543188406, 20929.59991855366, 2000.0, 1004620.7960905757, 137.63411934477546, 4000.0, -0.07917566300653991, 7299292, 78487.01075268818, 278.6611608265341, 94],[10]]
]
Upvotes: 0
Views: 275
Reputation: 70068
so to reframe the problem w/r/t the practical objective: you just need to come with an integer value to represents the number of neurons that comprises the hidden layer so that:
your solver (gradient descent, conjugate gradient, etc) converges during training; and
the total error (eg, RMSE) falls below some threshold you've deemed acceptable to you.
almost certainly there are several values that will satisfy both of those criteria; what's more, there's no reason you can't manually adjust this value during training.
in fact, this is a common practice, usually referred to as pruning.
which nodes are candiates for pruning? Those with the smallest weight values associated with them (rember, weights apply to the connections between nodes not the nodes themselves). If they are near zero then you know they have little influence on the result. (A commonly used technique to visually examine these weights is a Hinton Diagram (there's a code snippet in the gallery of the matplotlib homepage).
particularly given the latter, i recommend start by just choosing some integer which is just slightly larger than your input layer. Why? Well clearly the size of those layers can't be equal or your network won't have non-linearity, so that leaves either more neurons or less (relative to the input layer). As an initial guess to interate on (ie, to adjust during iteration progress) the former is preferable because the excess capacity will help your network converge, and that's what you want. Once it converges, then adjust downward the number of nodes in the hidden layer and look at the effect on your total error from doing so.
in sum, in this case, begin with 16 or 18 neurons, observe the iteration progress and prune accordingly.
if you prefer a less empirical approach (ie, one with some numerical justification behind it and that attempts to determine an optimal network size ex ante rather than via iteration & pruning), please see the accepted answer (mine) to a similar question.
Upvotes: 3