Reputation: 188
I am currently trying to run a Convolutional Neural Network using Keras on a tensorflow backend with the help of a Udemy course on deep learning. However it is running extremely slowly, taking around 1,000s per epoch while the lecturer's machine takes around 60s (he's running it on a CPU by the way).
The CNN is a simple image recognition network that recognizes whether an image is of a cat or a dog. The training and test data consist of a total of 10,000 images, all images together take up 237 MB on my SSD.
When I run the CNN in a Python shell, I get the following output:
Epoch 1/25
2017-05-28 13:23:03.967337: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your m
achine and could speed up CPU computations.
2017-05-28 13:23:03.967574: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your
machine and could speed up CPU computations.
2017-05-28 13:23:03.968153: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your
machine and could speed up CPU computations.
2017-05-28 13:23:03.968329: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on you
r machine and could speed up CPU computations.
2017-05-28 13:23:03.968576: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on you
r machine and could speed up CPU computations.
2017-05-28 13:23:04.505726: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:28:00.0
Total memory: 8.00GiB
Free memory: 6.68GiB
2017-05-28 13:23:04.505944: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:908] DMA: 0
2017-05-28 13:23:04.506637: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:918] 0: Y
2017-05-28 13:23:04.506895: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:2
8:00.0)
2684/8000 [=========>....................] - ETA: 845s - loss: 0.5011 - acc: 0.7427
This should indicate that tensorflow is using the GPU for its computations. However, when I check on nvidia-smi
, I get the following output:
$ nvidia-smi
Sun May 28 13:25:46 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 376.53 Driver Version: 376.53 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 WDDM | 0000:28:00.0 On | N/A |
| 0% 49C P2 36W / 166W | 7240MiB / 8192MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7676 C+G ...ost_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 8580 C+G Insufficient Permissions N/A |
| 0 9704 C+G ...x86)\Google\Chrome\Application\chrome.exe N/A |
| 0 10532 C ...\Anaconda3\envs\tensorflow-gpu\python.exe N/A |
| 0 11384 C+G Insufficient Permissions N/A |
| 0 12896 C+G C:\Windows\explorer.exe N/A |
| 0 13868 C+G Insufficient Permissions N/A |
| 0 14068 C+G Insufficient Permissions N/A |
| 0 14568 C+G Insufficient Permissions N/A |
| 0 15260 C+G ...osoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A |
| 0 16912 C+G ...am Files (x86)\Dropbox\Client\Dropbox.exe N/A |
| 0 18196 C+G ...I\AppData\Local\hyper\app-1.3.3\Hyper.exe N/A |
| 0 18228 C+G ...oftEdge_8wekyb3d8bbwe\MicrosoftEdgeCP.exe N/A |
| 0 20032 C+G ...indows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
+-----------------------------------------------------------------------------+
Note that every single process is displayed to use to use both CPU and GPU (Type C+G
) while the tensorflow process is the only one to only use the CPU (Type C
).
Is there any sensible explanation to this? I have been trying to fix this issue for the full last week but have gotten nowhere.
I am running a Windows 10 Pro machine with a Nvidia GTX 1070 by Asus, 24GB RAM and an Intel Xeon X5670 CPU @2.93GHz. I created my Anaconda environment with the following commands:
conda create -n tensorflow-gpu python=3.5 anaconda
source activate tensorflow-gpu
conda install theano
conda install mingw libpython
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl
pip install keras
conda update --all
I also installed the CUDA Toolkit and CUDNN and included their respective folders to my %PATH%
Every and any help would be greatly appreciated.
[EDIT]
The code in case anything is wrong with it.
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# Defining the CNN
classifier = Sequential()
# Convolution 1
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Convolution 2
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Flatten + MLP
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('dataset/training_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
classifier.fit_generator(training_set,
steps_per_epoch = 8000,
epochs = 25,
validation_data = test_set,
validation_steps = 2000)
Upvotes: 3
Views: 505
Reputation: 1064
My reply has nothing to do with the Udemy example, it's simply about checking whether the GPU is being utilized. On Linux, the PSensor utility allows you to observe GPU load and temperature (which is a good indication of usage). I'm not sure how to confirm GPU usage on Windows, perhaps someone else can help with that.
Upvotes: 0
Reputation: 11377
It does not have anything to do with your machine, I discussed the problem in this post on Udemy. Everyone seem to have the same issue and wonder how come it could be 20 minutes on instructor's machine. The answer is simple: the instructor has posted different source code than what he presented in the video!
Check doc for steps_per_epoch
steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.
Currently for a single epoch you take 8000 * 32 = 256000 images. That's the number of samples you are processing in every epoch. Doesn't make sense whatsoever if you consider your data set is merely 10000 (20k with augmentation).
if you check the video, you'll see the instructor is using samples_per_epoch
, meaning 32x less data. Case solved.
Upvotes: 4