Reputation: 468
According to the Keras Tuner examples here and here, if you want to define the number of layers and each layer's units in a deep learning model using hyper parameters you do something like this:
for i in range(hp.Int('num_layers', 1, 10)):
model.add(layers.Dense(units=hp.Int('unit_' + str(i), 32, 512, 32)))
However, as others have noted here and here after the oracle has seen a model with num_layers = 10
it will always assign a value to unit_0
through unit_9
, even when num_layers
is less than 10.
In the case that num_layers = 1
for example, only unit_0
will be used to build the model. But, unit_1
through unit_9
will be defined and active in the hyper parameters.
Does the oracle "know" that unit_1
through unit_9
weren't actually used to build the model (and therefore disregard their relevance for impacting the results of that trial)?
Or, does it assume unit_1
through unit_9
are being used because they have been defined (and calling hp.get('unit_9')
for example will return a value)?
In the latter case the oracle is using misinformation to drive the tuning process. As a result it will take longer to converge (at best) and incorrectly converge to a solution as a result of assigning relevance to the unused hyper parameters (at worst).
Should the model actually be defined using conditional scopes, like this?
num_layers = hp.Int('num_layers', 1, 10)
for i in range(num_layers):
with hp.conditional_scope('num_layers', list(range(i + 1, 10 + 1))):
model.add(layers.Dense(units=hp.Int('unit_' + str(i), 32, 512, 32)))
When defining the model like this, if num_layers < 10
, calling hp.get('unit_9')
will return a ValueError: Conditional parameter unit_10 is not currently active
, as expected.
Upvotes: 7
Views: 1297
Reputation: 76
Using conditional scope is the best as it correctly recognizes active parameters. Without using conditional scope it is, at least at the moment, not possible to let the tuner know what parameters are actually used.
However, when using RandomSearch the simpler way (that allows inactive parameters to be there) the result should be exactly the same. When starting a new trial the tuner will go through all possibilities, but will reject the invalid ones before actually starting the trial.
For the existing tuners I think only Bayesian is strongly affected by this. I am not 100% sure about the case of Hyperband; but for RandomSearch the two approaches are exactly the same (except for displaying inactive parameters that make people confused).
Upvotes: 5