Reputation: 2688
I am currently studying google tensorflow object detection API. When I try to retrain the model with Oxford III pet dataset, the training process is very slow.
Here is what I found so far:
I am trying to profile it with tensorflow profiler, but I am in a bit hurry now, any idea or suggestion would be helpful.
Upvotes: 1
Views: 1853
Reputation: 2296
There are many reasons for this to happen. The most common being that there is some problem with your record
file. There need to be done some testing before adding an image and it's contour to record file. Some of them are:
First check the image before sending it to the record:
def checkJPG(fn):
with tf.Graph().as_default():
try:
image_contents = tf.read_file(fn)
image = tf.image.decode_jpeg(image_contents, channels=3)
init_op = tf.initialize_all_tables()
with tf.Session() as sess:
sess.run(init_op)
tmp = sess.run(image)
except:
print("Corrupted file: ", fn)
return False
return True
Also, check the height and width of the contour and if any contour is not crossing the borders:
boxW = xmax - xmin
boxH = ymax - ymin
if boxW == 0 or boxH == 0:
print("...ONE CONTOUR SKIPPED... (boxW | boxH) = 0")
continue
if boxW*boxH < 100:
print("...ONE CONTOUR SKIPPED... (boxW*boxH) < 100")
continue
if xmin / width <= 0 or xmax / width <= 0 or ymin / height <= 0 or ymax / height <= 0:
print("...ONE CONTOUR SKIPPED... (x | y) <= 0")
continue
if xmin / width >= 1 or xmax / width >= 1 or ymin / height >= 1 or ymax / height >= 1:
print("...ONE CONTOUR SKIPPED... (x | y) >= 1")
continue
One of the other reason is that there is too much data in evaluation record
file. It's better to add only 10 images in your evaluation record file and change the evaluation config like this:
eval_config {
num_visualizations: 10
num_examples: 10
eval_interval_secs: 3000
max_evals: 1
use_moving_averages: false
}
Upvotes: 1
Reputation: 2688
I found the problems. It's the issue with input, my tfrecord file is corrupted somehow, so the input thread hang up sometimes.
Upvotes: 1
Reputation: 625
As i can see , it is not utilizing GPU as now, Have you tried to optimise GPU using tensorflow given parameter
https://www.tensorflow.org/performance/performance_guide#optimizing_for_gpu
Upvotes: 0