Yang Wenhao
Yang Wenhao

Reputation: 564

What costs the extra execution time of the routine in a pthread program?

I wrote four different programs to count total words in two files. These four versions look mostly the same. First three versions use two threads to count and just the orders of three statements are different. The last version uses one thread to count. I will list the different part of each version and the common part first, then the output of each version and my question.

different part:

// version 1
count_words(&file1); 
pthread_create(&new_thread, NULL, count_words, &file2);
pthread_join(new_thread, NULL);

// version 2
pthread_create(&new_thread, NULL, count_words, &file2);
count_words(&file1);
pthread_join(new_thread, NULL);

// version 3
pthread_create(&new_thread, NULL, count_words, &file2);
pthread_join(new_thread, NULL);
count_words(&file1);

// version 4
count_words(&file1);
count_words(&file2);

common part: (Insert the different part into this common part to make a complete version)

#include <stdio.h>
#include <pthread.h>
#include <ctype.h>
#include <stdlib.h>
#include <time.h>

#define N 2000

typedef struct file_t {
    char *name;
    int words;
} file_t;

double time_diff(struct timespec *, struct timespec *);
void *count_words(void *);

// Usage: progname file1 file2
int main(int argc, char *argv[]) {
    pthread_t new_thread;
    file_t file1, file2;

    file1.name = argv[1];
    file1.words = 0;
    file2.name= argv[2];
    file2.words = 0;

    // Insert different part here

    printf("Total words: %d\n", file1.words+file2.words);
    return 0;
}

void *count_words(void *arg) {
    FILE *fp;
    file_t *file = (file_t *)arg;
    int i, c, prevc = '\0';
    struct timespec process_beg, process_end;
    struct timespec thread_beg, thread_end;
    double process_diff, thread_diff;

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_beg);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_beg);

    fp = fopen(file->name, "r");
    for (i = 0; i < N; i++) {
        while ((c = getc(fp)) != EOF) {
            if (!isalnum(c) && isalnum(prevc))
                file->words++;
            prevc = c;
        }
        fseek(fp, 0, SEEK_SET);
    }
    fclose(fp);

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_end);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_end);

    process_diff = time_diff(&process_beg, &process_end);
    thread_diff = time_diff(&thread_beg, &thread_end);
    printf("count_words() in %s takes %.3fs process time and" 
           "%.3fs thread time\n", file->name, process_diff, thread_diff);
    return NULL;
}

double time_diff(struct timespec *beg, struct timespec *end) {
    return ((double)end->tv_sec + (double)end->tv_nsec*1.0e-9)
         - ((double)beg->tv_sec + (double)beg->tv_nsec*1.0e-9);
}

NOTE

  1. file1 is a file with 10000 words of "word". file2 is a copy of file1, created by cp command.
  2. To make the execution time long enough, program repeatly counts the words. N is the number of loops. So the result is not the accurate number of total words, but that multiplies by N.
  3. Please don't put too much emphasis on the counting algorithm. I am just concerned about the execution time in this example.
  4. An important information: The machine is Intel® Celeron(R) CPU 420 @ 1.60GHz. one core. The OS is Linux 3.2.0. Maybe one core is the cause of this strange phenomenon like others said. But I still want to figure it out.

The program counts words and uses clock_gettime() to calculate the process cpu time and the thread cpu time of routine count_words() and then output the times and the word number. Below is the output and my comment with questions. I will be very appreciated if someone can explain the reason what the extra time is taken on.

// version 1
count_words() in file1 takes 2.563s process time and 2.563s thread time
count_words() in file2 takes 8.374s process time and 8.374s thread time
Total words: 40000000

Comment: The original thread finishes count_words() and waits for the new thread to die. When count_words() running in the new thread, no context switch happens (because process time == thread time). Why it takes so much time? What happens in count_words() in the new thread?

// version 2
count_words() in file1 takes 16.755s process time and 8.377s thread time
count_words() in file2 takes 16.753s process time and 8.380s thread time
Total words: 40000000

Comment: Two threads parallel runs here. Context switch happens, so the process time > thread time.

// version 3
count_words() in file2 takes 8.374s process time and 8.374s thread time
count_words() in file1 takes 8.365s process time and 8.365s thread time
Total words: 40000000

Comment: New thread counts first and original thread waits for it. After new thread is joined, original thread begins to count. Neither of them has context switching, why so much time taken, especially the count after the new thread joined?

// version 4
count_words() in file1 takes 2.555s process time and 2.555s thread time
count_words() in file2 takes 2.556s process time and 2.556s thread time
Total words: 40000000

Comment: Fastest version. No new thread created. Both count_words() runs in a single thread.

Upvotes: 6

Views: 983

Answers (1)

zch
zch

Reputation: 15278

It's probably because creation of any thread forces libc to use synchronization in getc. This makes this function significantly slower. Following example is for me as slow as version 3:

void *skip(void *p){ return NULL; };
pthread_create(&new_thread, NULL, skip, NULL);
count_words(&file1); 
count_words(&file2); 

To fix this problem you can use a buffer:

for (i = 0; i < N; i++) {
    char buffer[BUFSIZ];
    int read;
    do {
        read = fread(buffer, 1, BUFSIZ, fp);
        int j;
        for(j = 0; j < read; j++) {
            if (!isalnum(buffer[j]) && isalnum(prevc))
                file->words++;
            prevc = buffer[j];
        }
    } while(read == BUFSIZ);
    fseek(fp, 0, SEEK_SET);
}

In this solution, IO functions are called rarely enough to make synchronization overhead insignificant. This not only solves problem of weird timings, but also makes it several times faster. For me it's reduction from 0.54s (without threads) or 0.85s (with threads) to 0.15s (in both cases).

Upvotes: 7

Related Questions