Unsure whether tensorflow-gpu actually uses GPU

Question

I am currently trying to run a Convolutional Neural Network using Keras on a tensorflow backend with the help of a Udemy course on deep learning. However it is running extremely slowly, taking around 1,000s per epoch while the lecturer's machine takes around 60s (he's running it on a CPU by the way).

The CNN is a simple image recognition network that recognizes whether an image is of a cat or a dog. The training and test data consist of a total of 10,000 images, all images together take up 237 MB on my SSD.

When I run the CNN in a Python shell, I get the following output:

Epoch 1/25
2017-05-28 13:23:03.967337: W c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your m
achine and could speed up CPU computations.
2017-05-28 13:23:03.967574: W c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your
machine and could speed up CPU computations.
2017-05-28 13:23:03.968153: W c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your
machine and could speed up CPU computations.
2017-05-28 13:23:03.968329: W c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on you
r machine and could speed up CPU computations.
2017-05-28 13:23:03.968576: W c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on you
r machine and could speed up CPU computations.
2017-05-28 13:23:04.505726: I c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\common_runt
ime\gpu\gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:28:00.0
Total memory: 8.00GiB
Free memory: 6.68GiB
2017-05-28 13:23:04.505944: I c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\common_runt
ime\gpu\gpu_device.cc:908] DMA: 0
2017-05-28 13:23:04.506637: I c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\common_runt
ime\gpu\gpu_device.cc:918] 0:   Y
2017-05-28 13:23:04.506895: I c:	f_jenkins\home\workspace
elease-win\device\gpu\os\windows	ensorflow\core\common_runt
ime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:2
8:00.0)
2684/8000 [=========>....................] - ETA: 845s - loss: 0.5011 - acc: 0.7427

This should indicate that tensorflow is using the GPU for its computations. However, when I check on nvidia-smi, I get the following output:

 $ nvidia-smi
Sun May 28 13:25:46 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 376.53                 Driver Version: 376.53                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070   WDDM  | 0000:28:00.0      On |                  N/A |
|  0%   49C    P2    36W / 166W |   7240MiB /  8192MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7676  C+G   ...ost_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      8580  C+G   Insufficient Permissions                     N/A      |
|    0      9704  C+G   ...x86)\Google\Chrome\Application\chrome.exe N/A      |
|    0     10532    C   ...\Anaconda3\envs	ensorflow-gpu\python.exe N/A      |
|    0     11384  C+G   Insufficient Permissions                     N/A      |
|    0     12896  C+G   C:\Windows\explorer.exe                      N/A      |
|    0     13868  C+G   Insufficient Permissions                     N/A      |
|    0     14068  C+G   Insufficient Permissions                     N/A      |
|    0     14568  C+G   Insufficient Permissions                     N/A      |
|    0     15260  C+G   ...osoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A      |
|    0     16912  C+G   ...am Files (x86)\Dropbox\Client\Dropbox.exe N/A      |
|    0     18196  C+G   ...I\AppData\Local\hyper\app-1.3.3\Hyper.exe N/A      |
|    0     18228  C+G   ...oftEdge_8wekyb3d8bbwe\MicrosoftEdgeCP.exe N/A      |
|    0     20032  C+G   ...indows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
+-----------------------------------------------------------------------------+

Note that every single process is displayed to use to use both CPU and GPU (Type C+G) while the tensorflow process is the only one to only use the CPU (Type C).

Is there any sensible explanation to this? I have been trying to fix this issue for the full last week but have gotten nowhere.

I am running a Windows 10 Pro machine with a Nvidia GTX 1070 by Asus, 24GB RAM and an Intel Xeon X5670 CPU @2.93GHz. I created my Anaconda environment with the following commands:

conda create -n tensorflow-gpu python=3.5 anaconda
source activate tensorflow-gpu
conda install theano 
conda install mingw libpython 
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl
pip install keras
conda update --all

I also installed the CUDA Toolkit and CUDNN and included their respective folders to my %PATH%

Every and any help would be greatly appreciated.

[EDIT]

The code in case anything is wrong with it.

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

# Defining the CNN
classifier = Sequential()
# Convolution 1
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Convolution 2
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Flatten + MLP
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

classifier.fit_generator(training_set,
                         steps_per_epoch = 8000,
                         epochs = 25,
                         validation_data = test_set,
                         validation_steps = 2000)

Lukasz Tracewski · Accepted Answer

It does not have anything to do with your machine, I discussed the problem in this post on Udemy. Everyone seem to have the same issue and wonder how come it could be 20 minutes on instructor's machine. The answer is simple: the instructor has posted different source code than what he presented in the video!

Check doc for steps_per_epoch

steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.

Currently for a single epoch you take 8000 * 32 = 256000 images. That's the number of samples you are processing in every epoch. Doesn't make sense whatsoever if you consider your data set is merely 10000 (20k with augmentation).

if you check the video, you'll see the instructor is using samples_per_epoch, meaning 32x less data. Case solved.

Unsure whether tensorflow-gpu actually uses GPU

Answers (2)

Related Questions