jmlopez
jmlopez

Reputation: 4953

Using a thread in C++ to report progress of computations

I'm writing a generic abstract class to be able to report on the status of as many instance variables as we need. For instance, consider the following useless loop:

int a, b;
for (int i=0; i < 10000; ++i) {
    for (int j=0; j < 1000; ++j) {
        for (int k =0; k < 1000; ++k) {
            a = i;
            b = j;
        }
    }
}

It would be nice to be able to see the values of a and b without having to modify the loop. In the past I have written if statements such as the following:

int a, b;
for (int i=0; i < 10000; ++i) {
    for (int j=0; j < 1000; ++j) {
        for (int k =0; k < 1000; ++k) {
            a = i;
            b = j;
            if (a % 100 == 0) {
                printf("a = %d\n", a);
            }
        }
    }
}

This would allow me to see the value of a every 100 iterations. However, depending on the computations being done, sometimes it is just not possible to check on the progress in this fashion. The idea is to have be able to go away from the computer, come back after a given time and check on whatever values you want to see.

To this end we can use pthreads. The following code works, and the only reason I am posting it is because I'm not sure if I'm using the thread correctly, mainly, how to shut it off.

First lets consider the file "reporter.h":

#include <cstdio>
#include <cstdlib>
#include <pthread.h>

void* run_reporter(void*);

class reporter {
public: 
    pthread_t thread;
    bool stdstream;
    FILE* fp;

    struct timespec sleepTime;
    struct timespec remainingSleepTime;

    const char* filename;
    const int sleepT;
    double totalTime;

    reporter(int st, FILE* fp_): fp(fp_), filename(NULL), stdstream(true), sleepT(st) {
        begin_report();
    }
    reporter(int st, const char* fn): fp(NULL), filename(fn), stdstream(false), sleepT(st) {
        begin_report();
    }
    void begin_report() {
        totalTime = 0;
        if (!stdstream) fp = fopen(filename, "w");
        fprintf(fp, "reporting every %d seconds ...\n", sleepT);
        if (!stdstream) fclose(fp);
        pthread_create(&thread, NULL, run_reporter, this);
    }
    void sleep() {
        sleepTime.tv_sec=sleepT;
        sleepTime.tv_nsec=0;
        nanosleep(&sleepTime, &remainingSleepTime);
        totalTime += sleepT;
    }
    virtual void report() = 0;
    void end_report() {
        pthread_cancel(thread);
        // Wrong addition of remaining time, needs to be fixed
        // but non-important at the moment.
        //totalTime += sleepT - remainingSleepTime.tv_sec;
        long sec = remainingSleepTime.tv_sec;
        if (!stdstream) fp = fopen(filename, "a");
        fprintf(fp, "reported for %g seconds.\n", totalTime);
        if (!stdstream) fclose(fp);
    }
};

void* run_reporter(void* rep_){
    reporter* rep = (reporter*)rep_;
    while(1) {
        if (!rep->stdstream) rep->fp = fopen(rep->filename, "a");
        rep->report();
        if (!rep->stdstream) fclose(rep->fp);
        rep->sleep();
    }
}

This file declares the abstract class reporter, notice the pure virtual function report. This is the function that will print the messages. Each reporter has its own thread and the thread gets created when the reporter constructor is called. To use the reporter object in our useless loop now we can do:

#include "reporter.h"
int main() {
    // Declaration of objects we want to track
    int a = 0;
    int b = 0;
    // Declaration of reporter
    class prog_reporter: public reporter {
    public:
        int& a;
        int& b;
        prog_reporter(int& a_, int& b_):
            a(a_), b(b_),
            reporter(3, stdout)
        {}
        void report() {
            fprintf(fp, "(a, b) = (%d, %d)\n", this->a, this->b);
        }
    };
    // Start tracking a and b every 3 seconds
    prog_reporter rep(a, b);

    // Do some useless computation
    for (int i=0; i < 10000; ++i) {
        for (int j=0; j < 1000; ++j) {
            for (int k =0; k < 1000; ++k) {
                a = i;
                b = j;
            }
        }
    }
    // Stop reporting
    rep.end_report();
}

After compiling this code (no optimization flag) and running it I obtain:

macbook-pro:Desktop jmlopez$ g++ testing.cpp
macbook-pro:Desktop jmlopez$ ./a.out 
reporting every 3 seconds ...
(a, b) = (0, 60)
(a, b) = (1497, 713)
(a, b) = (2996, 309)
(a, b) = (4497, 478)
(a, b) = (5996, 703)
(a, b) = (7420, 978)
(a, b) = (8915, 78)
reported for 18 seconds.

This does exactly what I wanted it to do, with the optimization flags then I get:

macbook-pro:Desktop jmlopez$ g++ testing.cpp -O3
macbook-pro:Desktop jmlopez$ ./a.out 
reporting every 3 seconds ...
(a, b) = (0, 0)
reported for 0 seconds.

Which is not surprising since the compiler probably rewrote my the code to give me the same answer in a shorter amount of time. My original question was going to be why the reporter did not give me the values of the variables if I made the loops longer, for instance:

for (int i=0; i < 1000000; ++i) {
    for (int j=0; j < 100000; ++j) {
        for (int k =0; k < 100000; ++k) {
            a = i;
            b = j;
        }
    }
}

After running the code again with the optimization flag:

macbook-pro:Desktop jmlopez$ g++ testing.cpp -O3
macbook-pro:Desktop jmlopez$ ./a.out 
reporting every 3 seconds ...
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
reported for 39 seconds.

Question: Is this output due to the optimization flag which modifies the code and it simply decides not to update the variables til the very end?

Main question:

In the reporter method end_report I call the function pthread_cancel. After reading the following answer it made me doubtful about the use of the function and how I was terminating the reporting thread. For those experienced with pthreads, is there any obvious holes or potential problems using the thread as I have done?

Upvotes: 6

Views: 880

Answers (2)

sonicwave
sonicwave

Reputation: 6102

About the main question: You're close. Add a call to pthread_join() (http://linux.die.net/man/3/pthread_join) after pthread_cancel(), and everything should be fine.

The join call makes sure that you clean up the threads resources, and, if forgotten, can lead to running out of threading resources in certain cases.

And just to add, the important point when using pthread_cancel() (apart from remembering to join the thread) is to make sure that the thread you are canceling has a so-called cancellation point, which your thread does by calling nanosleep() (and possibly also fopen, fprintf and fclose which may be cancellation points). If no cancellation point exists, your thread will just keep running.

Upvotes: 2

6502
6502

Reputation: 114579

C++ doesn't know about threads and your code uses two local variables a and b and makes no call to a function with unknown code.

What happens is that a and b end up in registers during the loop for the computation and they're updated only at the end of the loop.

While it's true that a and b must get a real memory address (because they've been passed as reference to an external function) the compiler doesn't know that some external code that knows the address of a and b will execute during the loop, and thus prefers to store all intermediate values to registers until the loop ends.

If your code in the loop however calls an unknown function (i.e. a function for which the implementation is not known) then the compiler will be forced to update a and b before calling the function because it must be paranoid and consider that the progress function that got passed the address of a and b may pass this information to the unknown function.

Upvotes: 3

Related Questions