why does tensorflow allocate more memory than requested in gpu? Is there any function to determine how much the memory allocated?

Question

the following question is not about how to config fraction of gpu memory used.

CPU:

FixedLengthRecordReaderV2 allocation_description { requested_bytes: 64 allocated_bytes: 64 allocator_name: "cpu" allocation_id: 107996

GPU:

Reshape/shape" tensor { dtype: DT_INT32 shape { dim { size: 1 } } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 329 ptr: 1112161657600 } } }

"Unknown" tensor { dtype: DT_UINT8 shape { dim { size: 3073 } } allocation_description { requested_bytes: 3073 allocated_bytes: 3328 allocator_name: "gpu_bfc" allocation_id: 152161 has_single_reference: true ptr: 1108327235584 } } }

Reshape/shape" tensor { dtype: DT_INT32 shape { dim { size: 1 } } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 329 ptr: 1112161657600 } } }

DecodeRaw" tensor { dtype: DT_UINT8 shape { dim { size: 3073 } } allocation_description { requested_bytes: 3073 allocated_bytes: 4864 allocator_name: "cuda_host_bfc" allocation_id: 35574 has_single_reference: true ptr: 1112190177280 } } }

transpose/perm" tensor { dtype: DT_INT32 shape { dim { size: 3 } } allocation_description { requested_bytes: 12 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 331 ptr: 1112161658112 } } }

stack" tensor { dtype: DT_INT32 shape { dim { size: 3 } } allocation_description { requested_bytes: 12 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 332 ptr: 1112161658368 } } }

stack" tensor { dtype: DT_INT32 shape { dim { size: 3 } } allocation_description { requested_bytes: 12 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 332 ptr: 1112161658368 } } }

stack" tensor { dtype: DT_INT32 shape { dim { size: 3 } } allocation_description { requested_bytes: 12 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 332 ptr: 1112161658368 } } }

1.Why does tensorflow allocate more memory than requested in gpu?

2.Is there any function to determine how much the memory allocated?

For the first question, I guest the purpose is that it can reduce the frequence of allocation. But I cannot understand why this mechanism is adopted by gpu while cpu memory allocator does not.

I am more interested in the second question.

Does anyone know the answer? Any information will be helpful.

sietschie · Accepted Answer

This is probably due to memory alignment. So you cannot get chunks of memory that are smaller than 256 bytes and if you want more it always will be multiples of 256 bytes. (That does not explain "requested_bytes: 3073 allocated_bytes: 4864" though.)

why does tensorflow allocate more memory than requested in gpu? Is there any function to determine how much the memory allocated?

Answers (1)

Related Questions