sgarizvi
sgarizvi

Reputation: 16796

CUDA memset in Compute Visual Profiler

I use Compute Visual Profiler to measure the performance of my CUDA programs.

The result of the profiler shows 2 different results for the cudaMemset function.

  1. memset32_post
  2. memset128

I want to know what is the difference between these 2?

Screenshot

Upvotes: 0

Views: 394

Answers (1)

Tom
Tom

Reputation: 21108

I would guess that the memset128 kernel does the bulk of the work and the memset32_post kernel cleans up the remainder since you used a size that is not a multiple of 128.

There's nothing to worry about, it's just trying to implement the memset in the most efficient manner possible, although I'd try to avoid memset in an inner-loop (on any processor). If you're really worried about this you could over-allocate.

Upvotes: 1

Related Questions