Reputation: 7486
Running a Keras model... the bad thing is that it is faster not to use CPU extentions (it should be the other way around). Look at the output below.
Is there a config file where I can set inter_op_parallelism option ?
Using TensorFlow backend.
2018-10-18 17:21:32.620461: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 17:21:32.621535: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Results: -33.20 (23.69) MSE
real 2m55.990s
user 4m8.784s
sys 3m50.192s
Using TensorFlow backend.
2018-10-18 17:25:04.773578: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Results: -32.57 (23.16) MSE
real 1m48.847s
user 2m15.792s
sys 0m13.440s
Upvotes: 1
Views: 1271
Reputation: 63
Here you are the code I'm using with keras, just put it on the top of your code.
from keras import backend as K
import tensorflow as tf
NUM_PARALLEL_EXEC_UNITS = 6
config = tf.ConfigProto(intra_op_parallelism_threads = NUM_PARALLEL_EXEC_UNITS,
inter_op_parallelism_threads = 1,
allow_soft_placement = True,
device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS })
session = tf.Session(config=config)
K.set_session(session)
import os
os.environ["OMP_NUM_THREADS"] = str(NUM_PARALLEL_EXEC_UNITS)
os.environ["KMP_BLOCKTIME"] = "30"
os.environ["KMP_SETTINGS"] = "1"
os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"
Note: I'm a little bit disappointed with the results. I could reach maximum 150% speeding up only playing with these parameters.
Upvotes: 1