Reputation: 73
I am running the following code:
when I run this code with 1 child process: i get the following timing information:
(I run using /usr/bin/time ./job 1)
5.489u 0.090s 0:05.58 99.8% (1 job running)
when I run with 6 children processes: i get following
74.731u 0.692s 0:12.59 599.0% (6 jobs running in parallel)
The machine I am running the experiment on has 6 cores, 198 GB of RAM and nothing else is running on that machine.
I was expecting the user time reporting to be 6 times in case of 6 jobs running in parallel. But it is much more than that (13.6 times). My questions is from where this increase in user time comes from? Is it because multiple cores are jumping from one memory location to another more frequently in case of 6 jobs running in parallel? Or there is something else I am missing.
Thanks
#define MAX_SIZE 7000000
#define LOOP_COUNTER 100
#define simple_struct struct _simple_struct
simple_struct {
int n;
simple_struct *next;
};
#define ALLOCATION_SPLIT 5
#define CHAIN_LENGTH 1
void do_function3(void)
{
int i = 0, j = 0, k = 0, l = 0;
simple_struct **big_array = NULL;
simple_struct *temp = NULL;
big_array = calloc(MAX_SIZE + 1, sizeof(simple_struct*));
for(k = 0; k < ALLOCATION_SPLIT; k ++) {
for(i =k ; i < MAX_SIZE; i +=ALLOCATION_SPLIT) {
big_array[i] = calloc(1, sizeof(simple_struct));
if((CHAIN_LENGTH-1)) {
for(l = 1; l < CHAIN_LENGTH; l++) {
temp = calloc(1, sizeof(simple_struct));
temp->next = big_array[i];
big_array[i] = temp;
}
}
}
}
for (j = 0; j < LOOP_COUNTER; j++) {
for(i=0 ; i < MAX_SIZE; i++) {
if(big_array[i] == NULL) {
big_array[i] = calloc(1, sizeof(simple_struct));
}
big_array[i]->n = i * 13;
temp = big_array[i]->next;
while(temp) {
temp->n = i*13;
temp = temp->next;
}
}
}
}
int main(int argc, char **argv)
{
int i, no_of_processes = 0;
pid_t pid, wpid;
int child_done = 0;
int status;
if(argc != 2) {
printf("usage: this_binary number_of_processes");
return 0;
}
no_of_processes = atoi(argv[1]);
for(i = 0; i < no_of_processes; i ++) {
pid = fork();
switch(pid) {
case -1:
printf("error forking");
exit(-1);
case 0:
do_function3();
return 0;
default:
printf("\nchild %d launched with pid %d\n", i, pid);
break;
}
}
while(child_done != no_of_processes) {
wpid = wait(&status);
child_done++;
printf("\nchild done with pid %d\n", wpid);
}
return 0;
}
Upvotes: 1
Views: 58
Reputation: 1666
Firstly, your benchmark is a bit unusual. Normally, when benchmarking concurrent applications, one would compare two implementations:
Then you divide the execution times to obtain the speedup.
If your speedup is:
The effect on performance depends on a variety of factors:
You can measure the last 2 with a profiler. Look for cache misses and stalled instructions.
Upvotes: 2