LogWell
LogWell

Reputation: 45

`cublasIsamin` returns an incorrect value

Arrays are stored as xyzxyz..., I want to get the maximum and minimum for some direction(x or y or z), and here is the test program:

#include <cuda_runtime.h>
#include <cuda_runtime_api.h> // cudaMalloc, cudaMemcpy, etc.
#include <cublas_v2.h>
#include <helper_functions.h> // shared functions common to CUDA Samples
#include <helper_cuda.h>      // CUDA error checking

#include <stdio.h> // printf
#include <iostream>

template <typename T>
void print_arr(T *arr, int L)
{
    for (int i = 0; i < L; i++)
    {
        std::cout << arr[i] << " ";
    }
    std::cout << std::endl;
}

int main()
{
    float hV[10] = {3, 0, 7, 1, 2, 8, 6, 7, 6, 4};
    print_arr(hV, 10);

    float *dV;
    cudaMalloc(&dV, sizeof(float) * 10);
    cudaMemcpy(dV, hV, sizeof(float) * 10, cudaMemcpyHostToDevice);

    cublasHandle_t cublasHandle = NULL;
    checkCudaErrors(cublasCreate(&cublasHandle));

    int hResult[2] = {0};
    checkCudaErrors(cublasIsamax(cublasHandle, 10, dV, 3, hResult + 0));
    checkCudaErrors(cublasIsamin(cublasHandle, 10, dV, 3, hResult + 1));
    print_arr(hResult, 2);

    return 0;
}

expected result:

3 0 7 1 2 8 6 7 6 4 
3 2

result:

3 0 7 1 2 8 6 7 6 4 
3 5

Is there a problem with this result? Or I misunderstood?

link to cublasIsamin.

Upvotes: 1

Views: 166

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151889

cublasIsamin finds the index of the minimum value. This index is not computed over the original array, but also takes the incx parameter into account. Furthermore, it will search over n elements (the first parameter) regardless of other parameters such as incx.

You have an array like this:

index:    0 1 2 3 4 5 6 7 8 9
x/y/z:    x y z x y z x y z x
value:    3 0 7 1 2 8 6 7 6 4
x index:  1     2     3     4

Therefore the minimum x value is at index 3, searching over a total of n=4 (not 10) elements. With respect to the x values, we must begin searching dV at offset 0 with an increment of 3, for a maximum of n=4 elements.

Taking all this into account, the correct calls are:

cublasIsamax(cublasHandle, 4, dV, 3, hResult + 0));
cublasIsamin(cublasHandle, 4, dV, 3, hResult + 1));

And the expected result is:

3 2

Upvotes: 2

Related Questions