Reputation: 13474
Consider this simple graph + session definition. Suppose I want to tune hyper params (learning rate and drop out keep probability) with a random search? What is the recommended way to implement it?
graph = tf.Graph()
with graph.as_default():
# Placeholders
data = tf.placeholder(tf.float32,shape=(None, img_h, img_w, num_channels),name='data')
labels = ...
dropout_keep_prob = tf.placeholder(tf.float32, name='keep_prob')
learning_rate = tf.placeholder(tf.float32, name='learning_rate')
# model architecture...
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
for step in range(num_steps):
offset = (step * batch_size) % (train_length.shape[0] - batch_size)
# Generate a minibatch.
batch_data = train_images[offset:(offset + batch_size), :]
#...
feed_train = {data: batch_data,
#...
learning_rate: 0.001,
keep_prob : 0.7
}
I tried putting everything inside a function
def run_model(learning_rate,keep_prob):
graph = tf.Graph()
with graph.as_default():
# graph here...
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
# session here...
But I ran into scope issues (I am not very familiar with scopes in Python/Tensoflow). Is there a best practice to achieve this?
Upvotes: 9
Views: 4161
Reputation: 5344
I implemented random search of hyper-parameter in a similar way, and things worked out fine. Basically what I did was I have a function general random hyper-parameters outside of graph and session. I wrapped the graph and session into a function as you did, and I passed on the generated hyper-parameters. See the code for better illustration.
def generate_random_hyperparams(lr_min, lr_max, kp_min, kp_max):
'''generate random learning rate and keep probability'''
# random search through log space for learning rate
random_learng_rate = 10**np.random.uniform(lr_min, lr_max)
random_keep_prob = np.random.uniform(kp_min, kp_max)
return random_learning_rate, random_keep_prob
I suspect the scope issue you are running into (since you didn't provide the exact error message I can only speculate) is caused by some careless naming... I would modify how you are naming variables in your run_model
function.
def run_model(random_learning_rate,random_keep_prob):
# Note that the arguments is named differently from the placeholders in the graph
graph = tf.Graph()
with graph.as_default():
# graph here...
learning_rate = tf.placeholder(tf.float32, name='learning_rate')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
# other operation ...
with tf.Session(graph=graph) as session:
tf.initialize_all_variables().run()
# session here...
feed_train = {data: batch_data,
#placeholder variable names as dict key, python value variables as dict value
learning_rate: random_learning_rate,
keep_prob : random_keep_prob
}
# evaluate performance with random_learning_rate and random_keep_prob
performance = session.run([...], feed_dict = feed_train)
return performance
Remember to use different variable names to name the tf.placeholders and the ones carrying the real python values.
The usage of above snippets would be something like:
performance_records = {}
for i in range(10): # random search hyper-parameter space 10 times
random_learning_rate, random_keep_prob = generate_random_hyperparams(-5, -1, 0.2, 0.8)
performance = run_model(random_learning_rate, random_keep_prob)
performance_records[(random_learning_rate, random_keep_prob)] = performance
Upvotes: 4