user3830430
user3830430

Reputation: 23

C Threads program

I wrote a program based on the idea of Riemann's sum to find out the integral value. It uses several threads, but the performance of it (the algorithm), compared to sequential program i wrote later, is subpar. Algorithm-wise they are identical except the threads stuff, so the question is what's wrong with it? pthread_join is not the case, i assume, because if one thread will finish sooner than the other thread, that join wait on, it will simply skip it in the future. Is that correct? The free call is probably wrong and there is no error check upon creation of threads, i'm aware of it, i deleted it along the way of testing various stuff. Sorry for bad english and thanks in advance.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include <time.h>


int counter = 0;
float sum = 0;
pthread_mutex_t mutx;

float function_res(float);


struct range {
    float left_border;
    int steps;
    float step_range;
};

void *calcRespectiveRange(void *ranges) {
    struct range *rangs = ranges;
    float left_border = rangs->left_border;
    int steps = rangs->steps;
    float step_range = rangs->step_range;
    free(rangs);
    //printf("left: %f steps: %d step range: %f\n", left_border, steps, step_range);
    int i;
    float temp_sum = 0;
    for(i = 0; i < steps; i++) {
        temp_sum += step_range * function_res(left_border);
        left_border += step_range;
    }
    sum += temp_sum;
    pthread_exit(NULL);
}


int main() {
    clock_t begin, end;

    if(pthread_mutex_init(&mutx, NULL) != 0) {
        printf("mutex error\n");
    }
    printf("enter range, amount of steps and threads: \n");
    float left_border, right_border;

    int steps_count;
    int threads_amnt;
    scanf("%f %f %d %d", &left_border, &right_border, &steps_count, &threads_amnt);
    float step_range = (right_border - left_border) / steps_count;
    int i;
    pthread_t tid[threads_amnt];
    float chunk = (right_border - left_border) / threads_amnt;
    int steps_per_thread = steps_count / threads_amnt;
    begin = clock();
    for(i = 0; i < threads_amnt; i++) {
        struct range *ranges;
        ranges = malloc(sizeof(ranges));
        ranges->left_border = i * chunk + left_border;
        ranges->steps = steps_per_thread;
        ranges->step_range = step_range;
        pthread_create(&tid[i], NULL, calcRespectiveRange, (void*) ranges);
    }
    for(i = 0; i < threads_amnt; i++) {
        pthread_join(tid[i], NULL);
    }
    end = clock();
    pthread_mutex_destroy(&mutx);
    printf("\n%f\n", sum);

    double time_spent = (double) (end - begin) / CLOCKS_PER_SEC;
    printf("Time spent: %lf\n", time_spent);
    return(0);
}

float function_res(float lb) {
    return(lb * lb + 4 * lb + 3);
}

Edit: in short - can it be improved to reduce execution time (with mutexes, for example)?

Upvotes: 2

Views: 122

Answers (1)

Kenney
Kenney

Reputation: 9093

The execution time will be shortened, provided you you have multiple hardware threads available.

The problem is in how you measure time: clock returns the processor time used by the program. That means, it sums the time taken by all the threads. If your program uses 2 threads, and it's linear execution time is 1 second, that means that each thread has used 1 second of CPU time, and clock will return the equivalent of 2 seconds.

To get the actual time used (on Linux), use gettimeofday. I modified your code by adding

#include <sys/time.h>

and capturing the start time before the loop:

struct timeval tv_start;
gettimeofday( &tv_start, NULL );

and after:

struct timeval tv_end;
gettimeofday( &tv_end, NULL );

and calculating the difference in seconds:

printf("CPU Time:    %lf\nTime passed: %lf\n",
    time_spent,
    ((tv_end.tv_sec * 1000*1000.0 + tv_end.tv_usec) -
    (tv_start.tv_sec * 1000*1000.0 + tv_start.tv_usec)) / 1000/1000
);

(I also fixed the malloc from malloc(sizeof(ranges)) which allocates the size of a pointer (4 or 8 bytes for 32/64 bit CPU) to malloc(sizeof(struct range)) (12 bytes)).

When running with the input parameters 0 1000000000 1000000000 1, that is, 1 billion iterations in 1 thread, the output on my machine is:

CPU Time:    4.352000
Time passed: 4.400006

When running with 0 1000000000 1000000000 2, that is, 1 billion iterations spread over 2 threads (500 million iterations each), the output is:

CPU Time:    4.976000
Time passed: 2.500003

For completeness sake, I tested it with the input 0 1000000000 1000000000 4:

CPU Time:    8.236000
Time passed: 2.180114

It is a little faster, but not twice as fast as with 2 threads, and it uses double the CPU time. This is because my CPU is a Core i3, a dual-core with hyperthreading, which aren't true hardware threads.

Upvotes: 2

Related Questions