Reputation: 89
I'm trying to get my models hyperparameters tuned with the ml-engine but i'm not quite sure if its working or not.
I'm not specifying the algorithm
tag in HyperparameterSpec
, which should default to Bayesian optimization method according to the documentation. Im also not setting maxFailedTrials
, which according to the documentation, should end all trails if the first one fails.
Here is my config
trainingInput:
scaleTier: CUSTOM
masterType: standard_gpu
hyperparameters:
goal: MAXIMIZE
maxTrials: 8
maxParallelTrials: 2
hyperparameterMetricTag: test_accuracy
params:
- parameterName: dropout_rate
type: DOUBLE
minValue: 0.3
maxValue: 0.7
scaleType: UNIT_LINEAR_SCALE
- parameterName: lr
type: DOUBLE
minValue: 0.0001
maxValue: 0.0003
scaleType: UNIT_LINEAR_SCALE
And here is the training output:
{
"completedTrialCount": "8",
"trials": [
{
"trialId": "1",
"hyperparameters": {
"lr": "0.00014959385395050048",
"dropout_rate": "0.42217149734497067"
},
"startTime": "2019-10-07T09:40:02.143968039Z",
"endTime": "2019-10-07T09:47:50Z",
"state": "FAILED"
},
{
"trialId": "2",
"hyperparameters": {
"dropout_rate": "0.62217149734497068",
"lr": "0.00028292718728383382"
},
"startTime": "2019-10-07T09:40:02.144192681Z",
"endTime": "2019-10-07T09:47:19Z",
"state": "FAILED"
},
{
"trialId": "3",
"hyperparameters": {
"lr": "0.00014846909046173097",
"dropout_rate": "0.31717863082885739"
},
"startTime": "2019-10-07T09:48:09.266596472Z",
"endTime": "2019-10-07T09:55:26Z",
"state": "FAILED"
},
{
"trialId": "4",
"hyperparameters": {
"lr": "0.00018741662502288819",
"dropout_rate": "0.34178204536437984"
},
"startTime": "2019-10-07T09:48:10.761305330Z",
"endTime": "2019-10-07T09:55:58Z",
"state": "FAILED"
},
{
"trialId": "5",
"hyperparameters": {
"dropout_rate": "0.6216828346252441",
"lr": "0.00010192830562591553"
},
"startTime": "2019-10-07T09:56:15.904704865Z",
"endTime": "2019-10-07T10:04:04Z",
"state": "FAILED"
},
{
"trialId": "6",
"hyperparameters": {
"dropout_rate": "0.42288427352905272",
"lr": "0.000230206298828125"
},
"startTime": "2019-10-07T09:56:17.895067636Z",
"endTime": "2019-10-07T10:04:05Z",
"state": "FAILED"
},
{
"trialId": "7",
"hyperparameters": {
"lr": "0.00019101441543291624",
"dropout_rate": "0.36415641310447144"
},
"startTime": "2019-10-07T10:05:22.147233194Z",
"endTime": "2019-10-07T10:13:09Z",
"state": "FAILED"
},
{
"trialId": "8",
"hyperparameters": {
"dropout_rate": "0.69955616224911532",
"lr": "0.00029989311482522672"
},
"startTime": "2019-10-07T10:05:22.147396438Z",
"endTime": "2019-10-07T10:13:30Z",
"state": "FAILED"
}
],
"consumedMLUnits": 2.29,
"isHyperparameterTuningJob": true,
"hyperparameterMetricTag": "test_accuracy"
}
All trails are run, so I believe its the search algorithm that fails for some reason. I haven't been able to locate any more information on why its returns this or any logs from the search algorithm by running with another verbosity.
To me it seems like its not able to locate the metric in the tensorflow event files, but I don't understand why, since the name is exactly the same, opening the event files with tensorboard i'm able to see the data. Maybe there is some requirements for the log structure i'm not aware of?
The code for logging metrics:
from tensorflow.contrib.summary import summary as summary_ops
# in __init__
self.tf_board_writer = summary_ops.create_file_writer(self.save_path)
....
# During training
with self.tf_board_writer.as_default(), summary_ops.always_record_summaries():
summary_ops.scalar(name=name, tensor=value, step=step)
Small side question if any from the ml-engine team ends up in here, now that TF2 is stable and released, do you have any idea when it will be available in the runtime environment?
Anyways, hope someone can help me out :)
Upvotes: 0
Views: 223
Reputation: 89
The problem could be solved by using the python package cloudml-hypertune
with the following code:
self.hpt.report_hyperparameter_tuning_metric(
hyperparameter_metric_tag=hypeparam_metric_name,
metric_value=value,
global_step=step)
And then set hyperparameterMetricTag
in HyperparameterSpec
to hypeparam_metric_name
Upvotes: 1