Reputation: 31
This is about running Tensorflow native on Windows with GPU support (v0.12)
While some examples work (matmul.py) and I can see a big performance difference with GPU (1.3s) versus CPU (4.4s), I do get an issue with one example:
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support.
While others have had a problem with the library for cuDNN not being loaded, my library is correctly found and loading:
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
Does anybody have the same issue? Was anybody able to solve it? Can I do something to get more logging about what is going wrong?
Upvotes: 3
Views: 4920
Reputation: 126184
Although TensorFlow reports an error when this message is produced, you can probably ignore it, unless you are running in a multiple-GPU configuration with different GPUs attached to different NUMA nodes. As the comment in the code says:
if (numa_node < 0) {
// For some reason the StreamExecutor couldn't get the NUMA
// affinity of the GPU. If this is not a multi-socket mobo with
// GPUs local to different buses, it doesn't matter. If it is, we
// may run into trouble later with data transfer operations. The
// trouble may manifest as slower than expected performance, or
// outright failures.
LOG(ERROR) << "Could not identify NUMA node of " << name
<< ", defaulting to 0. Your kernel may not have been built "
"with NUMA support.";
numa_node = 0;
}
As it turns out, the code to discover NUMA nodes is only implemented on Linux, as it uses SysFS. If you are running a big-iron Windows server with multiple GPUs and NUMA, please let us know in a GitHub issue, so we can prioritize adding this support.
Upvotes: 3