Reputation: 544

How to use L2 Cache in CUDA

I have searched other threads on usage of L2 cache in CUDA. But, unable to find the solution. How do i make use of L2 Cache? Is there any invoking function or declaration for its use? Like, for using shared memory, we use __device__ __shared__. Is there anything like that for L2 Cache??

Upvotes: 3

Answers (1)

Greg Smith

Reputation: 11549

The L2 cache is transparent to device code. All accesses to memory (global, local, surface, texture, constant, and instruction) that do not hit in L1 go to L2. All writes go through L2.

CUDA C Programming Guide F.4.2 : Global Memory

This sections provides a few more details on L2.

The compiler flag -dlcm=cg can be used to make global accesses be uncached in L1 and cached in L2.

CUDA C Programming Guide B.5 : Memory Fence Functions

The function __threadfence() can be used to make sure that all writes to global memory are visible in L2.

The function __threadfence_system() can be used to make sure that all writes to global memory are visible to host threads.

Upvotes: 4

How to use L2 Cache in CUDA

Answers (1)

Related Questions