Reputation: 33
A matrix chain is a chain of matrix matrix product. I consider the following matrix chain:
ABC; where A and B are of size 3000x3000 and C is of size 3000x600
There are two ways to evaluate the above expression which significantly differ in performance:
Variant 1: (AB)C : 6.48e10 FLOPs
Variant 2: A(BC) : 2.16e10 FLOPs
The cost of a matrix matrix multiplication AB, where A is of size mxn and B is of size nxk, is 2xmxnxk. Using this formula, I obtained the performance in terms of FLOPs for the variants mentioned above.
If the parenthesizations are not explicitly specified, my TensorFlow build (version 2.8 with Eager mode disabled) choses only the variant 1 (left to right parenthesization), whose execution time is almost three times as that of variant 2. Although I could optimize this and parenthesise manually by explicitly computing the FLOPs of matrix multiplication, I'm curious if this could be done automatically by the Grappler graph optimiser used by TensorFlow? Are there any other graph optimisers that could automatically choose the best parenthesization?
Sample Script to observe performance effect of different parenthesizations
import tensorflow as tf
import os
import time
class bcolors:
WARNING = '\033[93m'
ENDC = '\033[0m'
#Check if MKL is enabled
import tensorflow.python.framework as tff
print(bcolors.WARNING + "MKL Enabled : ", tff.test_util.IsMklEnabled(), bcolors.ENDC)
#Set threads
tf.config.threading.set_inter_op_parallelism_threads(1)
tf.config.threading.set_intra_op_parallelism_threads(1)
tf.config.run_functions_eagerly(False)
#Problem size
n = 3000
reps = 10
DTYPE = tf.float32
@tf.function
def mc_non_optimized(A,B,C):
# Default Parenthesization (Variant 1)
start = tf.timestamp()
with tf.control_dependencies([start]):
ret = A@B@C
with tf.control_dependencies([ret]):
end = tf.timestamp()
tf.print("Non Optimized : ", end-start)
return ret
@tf.function
def mc_optimized(A,B,C):
#Optimized parenthesization (Variant 2)
start = tf.timestamp()
with tf.control_dependencies([start]):
# I do not want to manually find the optimum parethesization every time
ret = A@(B@C)
with tf.control_dependencies([ret]):
end = tf.timestamp()
tf.print("Optimized : ", end-start)
return ret
A = tf.random.normal([n, n], dtype=DTYPE)
B = tf.random.normal([n, n], dtype=DTYPE)
C = tf.random.normal([n, int(n/5)], dtype=DTYPE)
for i in range(reps):
ret = mc_non_optimized(A,B,C)
ret = mc_optimized(A,B,C)
tf.print("\n")
The execution times with TensorFlow 2.8 (CPU) built with Intel MKL and Python 3.9.7 run on a Mac book pro 2018 Big sur
Upvotes: 1
Views: 142