Raidriar
Raidriar

Reputation: 1

How to parallelize this code with pthreads?

I am attempted to solve this assignment for class with pthreads. I have defined my struct, attempted to pass in the variables to my run2 section, but all it results with is segmentation fault core dumped. The code in there was originally in the KNN_POSIX2 method, but I have separated it out in preparation to make it multithreaded. I figured I would try with a single POSIX thread and see how it goes, but no dice thus far. I am brand new to C, so please be patient with me.

typedef struct PASSING_PARAMS2 {
    int* thread_identifier;
    ArffData* traindata;
    ArffData* testdata;
    int k;
    int tcount;
    int qIndex;
    int* parampred;
    int* paramclasscount;
    float* paramcandidates;
    int paramnumclasses;
} PassingParams2;

void* run2(void* ptr) {
    PassingParams2 *paramPtr = (PassingParams2 *)ptr;
    ArffData* train = (ArffData*)paramPtr->traindata;
    ArffData* test = (ArffData*)paramPtr->testdata;
    float* candidates = (float*)paramPtr->paramcandidates;
    int k = (int)paramPtr->k;
    int* classCounts = (int*)paramPtr->paramclasscount;
    int num_classes = (int) paramPtr->paramnumclasses;

    for(int queryIndex = 0; queryIndex < test->num_instances(); queryIndex++) {
        for(int keyIndex = 0; keyIndex < train->num_instances(); keyIndex++) {
            
            float dist = distance(test->get_instance(queryIndex), train->get_instance(keyIndex));

            // Add to our candidates
            for(int c = 0; c < k; c++){
                if(dist < candidates[2*c]){
                    // Found a new candidate
                    // Shift previous candidates down by one
                    for(int x = k-2; x >= c; x--) {
                        candidates[2*x+2] = candidates[2*x];
                        candidates[2*x+3] = candidates[2*x+1];
                    }
                    
                    // Set key vector as potential k NN
                    candidates[2*c] = dist;
                    candidates[2*c+1] = train->get_instance(keyIndex)->get(train->num_attributes() - 1)->operator float(); // class value

                    break;
                }
            }
        }

        // Bincount the candidate labels and pick the most common
        for(int i = 0; i < k;i++){
            classCounts[(int)candidates[2*i+1]] += 1;
        }
        
        int max = -1;
        int max_index = 0;
        for(int i = 0; i < num_classes;i++){
            if(classCounts[i] > max){
                max = classCounts[i];
                max_index = i;
            }
        }

        predictions[queryIndex] = max_index;

        for(int i = 0; i < 2*k; i++){ candidates[i] = FLT_MAX; }
        memset(classCounts, 0, num_classes * sizeof(int));
    }
    pthread_exit(0);
}

int* KNN_POSIX2(ArffData* train, ArffData* test, int k, int t) {

    // Predictions is the array where you have to return the class predicted (integer) for the test dataset instances
    int* predictions = (int*)malloc(test->num_instances() * sizeof(int));

    // Stores k-NN candidates for a query vector as a sorted 2d array. First element is inner product, second is class.
    float* candidates = (float*) calloc(k*2, sizeof(float));
    for(int i = 0; i < 2*k; i++){ candidates[i] = FLT_MAX; }

    int num_classes = train->num_classes();

    // Stores bincounts of each class over the final set of candidate NN
    int* classCounts = (int*)calloc(num_classes, sizeof(int));
    
    //Setup of the parameters to be passed by the struct
    PassingParams2 *paramPtr;
    paramPtr = (PassingParams2 *) malloc(1* sizeof(PassingParams2));
    paramPtr->thread_identifier = (int*)malloc(1 * sizeof(int));
    paramPtr->traindata = train;
    paramPtr->k = k;
    paramPtr->tcount = t;
    paramPtr->paramnumclasses = num_classes;

    // create the threads
    pthread_t newthread;
    pthread_create(&newthread, NULL, &run2, (void*) paramPtr);

    return predictions;
}

Upvotes: 0

Views: 132

Answers (1)

Solomon Slow
Solomon Slow

Reputation: 27115

Your run2(...) function appears to use a free variable, predictions, as if it were an array of int or a pointer to an array of int. Where is predictions declared? Where is it initialized? It is NOT the same as the local variable, predictions that is declared and initialized in the KNN_POSIX2(...) function.

Is it possible that predictions is a global int* variable, and is it possible that you are using it uninitialized in run2(...)?


Also note: Your KNN_POSIX2(...) starts a new thread, and then it immediately returns a pointer to the predictions array that it allocated. Is that supposed to be the same array that run2() fills in?

If so, then how will the caller of KNN_POSIX2() know when the thread has finished filling it in?

Upvotes: 2

Related Questions