Reputation: 301
I'm confusing about the performance of my code, when dealing with single thread it only using 13s, but it's will consume 80s. I don't know whether the vector can only be accessed by one thread at a time, if so it's likely I have to use a struct array to store data instead of vector, could anyone kindly help?
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iterator>
#include <string>
#include <ctime>
#include <bangdb/database.h>
#include "SEQ.h"
#define NUM_THREADS 16
using namespace std;
typedef struct _thread_data_t {
std::vector<FDT> *Query;
unsigned long start;
unsigned long end;
connection* conn;
int thread;
} thread_data_t;
void *thr_func(void *arg) {
thread_data_t *data = (thread_data_t *)arg;
std::vector<FDT> *Query = data->Query;
unsigned long start = data->start;
unsigned long end = data->end;
connection* conn = data->conn;
printf("thread %d started %lu -> %lu\n", data->thread, start, end);
for (unsigned long i=start;i<=end ;i++ )
{
FDT *fout = conn->get(&((*Query).at(i)));
if (fout == NULL)
{
//printf("%s\tNULL\n", s);
}
else
{
printf("Thread:%d\t%s\n", data->thread, fout->data);
}
}
pthread_exit(NULL);
}
int main(int argc, char *argv[])
{
if (argc<2)
{
printf("USAGE: ./seq <.txt>\n");
printf("/home/rd/SCRIPTs/12X18610_L5_I052.R1.clean.code.seq\n");
exit(-1);
}
printf("%s\n", argv[1]);
vector<FDT> Query;
FILE* fpin;
if((fpin=fopen(argv[1],"r"))==NULL) {
printf("Can't open Input file %s\n", argv[1]);
return -1;
}
char *key = (char *)malloc(36);
while (fscanf(fpin, "%s", key) != EOF)
{
SEQ * sequence = new SEQ(key);
FDT *fk = new FDT( (void*)sequence, sizeof(*sequence) );
Query.push_back(*fk);
}
unsigned long Querysize = (unsigned long)(Query.size());
std::cout << "myvector stores " << Querysize << " numbers.\n";
//create database, table and connection
database* db = new database((char*)"berrydb");
//get a table, a new one or existing one, walog tells if log is on or off
table* tbl = db->gettable((char*)"hg19", JUSTOPEN);
if(tbl == NULL)
{
printf("ERROR:table NULL error");
exit(-1);
}
//get a new connection
connection* conn = tbl->getconnection();
if(conn == NULL)
{
printf("ERROR:connection NULL error");
exit(-1);
}
cerr<<"begin querying...\n";
time_t begin, end;
double duration;
begin = clock();
unsigned long ThreadDealSize = Querysize/NUM_THREADS;
cerr<<"Querysize:"<<ThreadDealSize<<endl;
pthread_t thr[NUM_THREADS];
int rc;
thread_data_t thr_data[NUM_THREADS];
for (int i=0;i<NUM_THREADS ;i++ )
{
unsigned long ThreadDealStart = ThreadDealSize*i;
unsigned long ThreadDealEnd = ThreadDealSize*(i+1) - 1;
if (i == (NUM_THREADS-1) )
{
ThreadDealEnd = Querysize-1;
}
thr_data[i].conn = conn;
thr_data[i].Query = &Query;
thr_data[i].start = ThreadDealStart;
thr_data[i].end = ThreadDealEnd;
thr_data[i].thread = i;
}
for (int i=0;i<NUM_THREADS ;i++ )
{
if (rc = pthread_create(&thr[i], NULL, thr_func, &thr_data[i]))
{
fprintf(stderr, "error: pthread_create, rc: %d\n", rc);
return EXIT_FAILURE;
}
}
for (int i = 0; i < NUM_THREADS; ++i) {
pthread_join(thr[i], NULL);
}
cerr<<"done\n"<<endl;
end = clock();
duration = double(end - begin) / CLOCKS_PER_SEC;
cerr << "runtime: " << duration << "\n" << endl;
db->closedatabase(OPTIMISTIC);
delete db;
printf("Done\n");
return EXIT_SUCCESS;
}
Upvotes: 2
Views: 696
Reputation: 76296
Like all data structures in standard library, methods of vector
are reentrant, but not thread-safe. That means different instances can be accessed by multiple threads independently, but each instance may only be accessed by one thread at a time and you have to ensure it. But since you have separate vector for each thread, that's not your problem.
What is probably your problem is the printf
. printf
is thread-safe, meaning you can call it from any number of threads at the same time, but at the cost of being wrapped in mutual exclusion internally.
Majority of work in the threaded part of your program is done inside printf
. So what probably happens is that all the threads are started and quickly get to the printf
, where all but the first will stop. When the printf finishes and releases the mutex, system considers scheduling the threads that were waiting for it. It probably does, so rather slow context switch happens. And repeats after every printf
.
How exactly it happens depends on which actual locking primitive is being used, which depends on your operating system and standard library versions. The system should each time wake up only the next sleeper, but many implementations actually wake up all of them. So in addition to the printf
s being executed in mostly round-robin fashion, incurring one context switch for each, there may be quite a few additional spurious wake-ups in which the thread just finds the lock is held and goes back to sleep.
So the lesson from this is that threads don't make things automagically faster. They only help when:
And they never help when they all manipulate the same objects, so they end up spending most of the time waiting for locks anyway. In addition to any of your own objects that you lock, keep in mind that input and output file handles have to be internally locked as well.
Memory allocation also needs to internally synchronize between threads, but modern allocators have separate pools for threads to avoid much of it; if the default allocator proves to be too slow with many threads, there are some specialized ones you can use.
Upvotes: 4