Reputation: 2013
[tensorflow.__version__
: '0.12.head'
]
I have 5 networks with different structure and parameters. I want to deploy them on a server with a GPU. In my understanding, it is more efficient when data is processed in batches. But I don't know how to determine the batch size.
So I tried the following:
net_output = create_graph()
sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
Running sess1.run
got warnings like
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 1.51GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
It seemed that sess1
could be ran with batch_size 64
despite that batch_size 64
is too large.
After that the server received requests to use the other four networks. So I loaded the networks with four sessions:
# assume 5 networks have the same structure
sess2 = tf.Session()
sess2.run(tf.global_variables_initializer())
sess3 = tf.Session()
sess3.run(tf.global_variables_initializer())
sess4 = tf.Session()
sess4.run(tf.global_variables_initializer())
sess5 = tf.Session()
sess5.run(tf.global_variables_initializer())
Then I ran the first network again:
batch_size = 64
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
This time I got an Exception
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape...
So I decided to unload networks,
sess2.close()
sess3.close()
sess4.close()
sess5.close()
sess1.run(net_output, {net_input_img: np.random.rand(batch_size, 256, 256, 3)})
However I still got the exception ResourceExhaustedError
when I called sess1.run
after closing the other four sessions.
My questions are:
Does Session.close
release GPU memory? Why is sess1.run
not able to forward after I launched and closed sess[2-5]
?
Are there any better ways to deploy more than one networks on a server with a GPU?
I thought that maybe I could launch 5 processes, one for each network, and called
gpu_options = tf.ConfigProto(gpu_options=tf.GPUOptions(
per_process_gpu_memory_fraction=0.2))
sess = tf.Session(config=gpu_options)
in each process. However I was afraid that the valid batch size would become smaller compared to per_process_gpu_memory_fraction=1.0
.
[Update]
I run the following code
import tensorflow as tf
import tensorflow.contrib.slim as slim
def create_graph(x):
dummy = tf.zeros([100, 100, 100, 5])
return slim.repeat(x, 3, slim.conv2d, 87, [5, 5])
batch_size = 64
x = tf.zeros([batch_size, 256, 256, 3])
output = create_graph(x)
sess1 = tf.Session()
sess1.run(tf.global_variables_initializer())
sess1.run(output)
num_other_sessions = 50
other_sessions = []
for _ in range(num_other_sessions):
sess = tf.Session()
sess.run(tf.global_variables_initializer())
other_sessions.append(sess)
try:
sess1.run(output)
except Exception as e:
print(e)
for sess in other_sessions:
sess.close()
# If I run the following two lines, the bottom sess1.run(output) could be run without error.
# del sess
# del other_sessions
try:
sess1.run(output)
except Exception as e:
print(e)
When I called sess1.run(output)
for the first time, I only got a warning
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB.
After I launched the other 50
sessions, calling sess1.run(output)
yield an exception ResourceExhaustedError
. I tried closing those sessions, but it didn't help.
A part of the log message:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.228
pciBusID 0000:03:00.0
Total memory: 5.93GiB
Free memory: 5.84GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 124.62MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:03:00.0)
...
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 1, Chunks in use: 0 7.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 1, Chunks in use: 0 25.5KiB allocated for chunks. 25.5KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 1, Chunks in use: 0 586.2KiB allocated for chunks. 384.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 739.2KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:666] Size: 586.2KiB | Requested Size: 384.0KiB | in_use: 0, prev: Size: 25.5KiB | Requested Size: 25.5KiB | in_use: 1, next: Size: 739.2KiB | Requested Size: 739.2KiB | in_use: 1
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780900 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780c00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309780f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1309781800 of size 256
...
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x130eb8c100 of size 7168
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310cf5a00 of size 26112
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1310d0f200 of size 600320
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 255 Chunks of size 256 totalling 63.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 303 Chunks of size 512 totalling 151.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 768 totalling 2.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 1280 totalling 60.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 3 Chunks of size 1536 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 49 Chunks of size 26112 totalling 1.22MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 49920 totalling 48.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50944 totalling 49.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 98 Chunks of size 756992 totalling 70.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 783104 totalling 2.99MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 50331648 totalling 48.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1459617792 totalling 2.72GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2904102656 totalling 2.70GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 5.54GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 5953290240
InUse: 5952656640
MaxInUse: 5953264128
NumAllocs: 1259
MaxAllocSize: 2904102656
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************xxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 739.2KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:975] Resource exhausted: OOM when allocating tensor with shape[87,87,5,5]
Upvotes: 1
Views: 1571
Reputation: 271
Tensors in TensorFlow hold most of the memory. Most of them are immutable, and live within session.run() call. Others, such as variable, queue, can live longer. Depending on the type of session you are using, DirectSession releases their variables upon closing the session.
So from your code, it is a bit strange that opening other sessions, running the global initializer, and closing them has impact on memory usage. Maybe you can simplify and share the complete repro case so we can understand the full picture.
Upvotes: 1