Reputation: 885
With the latest version of Tensor Flow now on windows, I am trying to get everything working as efficiently as possible. However, even when compiling from source, I still can't seem to figure out how to enable the SSE and AVX instructions.
The default process: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake has no mention of how to do this.
The only reference I have found has been using Google's Bazel: How to compile Tensorflow with SSE4.2 and AVX instructions?
Does anyone know of an easy way to turn on these advanced instructions using MSBuild? I hear they give at least a 3X speed up.
To help those looking for a similar solution, this is the warning I am currently getting looks like this: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake
I am using Windows 10 Professional on a 64 bit platform, Visual Studio 2015 Community Edition, Anaconda Python 3.6 with cmake version 3.6.3 (later versions don't work for Tensor Flow)
Upvotes: 7
Views: 15382
Reputation: 31
Tensorflow make a mistake on flag "tensorflow_WIN_CPU_SIMD_OPTIONS".
It is a Flag, not a Boolean.
"Tensorflow-github/tensorflow/contrib/cmake/CMakeLists.txt" Line 34,there is:
option(tensorflow_WIN_CPU_SIMD_OPTIONS "Enables CPU SIMD instructions")
Replace it wtih
set(tensorflow_WIN_CPU_SIMD_OPTIONS "/arch:AVX" CACHE STRING "Enables CPU SIMD instructions" )
Then, clear the cmake cache (location), and reconfigure.
You will find tensorflow_WIN_CPU_SIMD_OPTIONS is a Flag with Input area instead of checkbox.
"/arch:AVX" or "/arch:AVX2" is available
Upvotes: 1
Reputation: 780
Well, I tried to fix that, but I am not sure if it really worked.
In CMakeLists.txt
you will find the following statements:
if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
include(CheckCXXCompilerFlag)
CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
On MSVC platform, the test failes because MSVC doesn't support -march=native
flag. I modified the statements like below:
if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
include(CheckCXXCompilerFlag)
CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
else()
CHECK_CXX_COMPILER_FLAG("/arch:AVX" COMPILER_OPT_ARCH_AVX_SUPPORTED)
if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
endif()
endif()
endif()
By doing this, cmake would check if /arch:AVX
is available and use it. Accordinf to MSDN and MSDN, SSE2 support is enabled by default for x86 compiling but not available for x64 compiling. For x64 compiling you can choose to use AVX or AVX2. I used AVX above because my CPU only supports AVX, youcan try AVX2 if you have a compatible CPU.
By compiling use the above CMakeLists.txt
, the compiling preocedure was much slower than official release, and warning about 'AVX/AVX2' disappeared, but warning about SSE/SSE2/3/4.1/4.2 still exists. I think these warnings can be ignored because there's no SSE support for x64 MSBuild.
I am testing the new pip package now. It maybe faster than before, but I don't want to write a new benchmark ...
Any one who is interested in this, please test if the new package is really faster.
I did all these on the lasted git master branch, 2017-3-12. The pip package name shows that it was tensorflow 1.0.1
Upvotes: 6
Reputation: 4945
I think you would have to add /arch:avx2
to compiler flags.
One way to do it is to modify your CMakeCache.txt
in your build folder. Looking for a line CMAKE_CXX_FLAGS:STRING
and modify it to
CMAKE_CXX_FLAGS:STRING=/DWIN32 /D_WINDOWS /W3 /GR /EHsc /arch:AVX2 /fp:fast
However, according to this issue on github. /arch:avx2
is broken at the moment (at HEAD).
Upvotes: 3