Reputation: 1637
I have a txt file that contains 2 graphs and the number of vertices in the following format:
6
0 1 0 1 0 0
1 0 1 0 0 1
0 1 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 1
0 1 0 0 1 0
0 1 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 1
0 0 1 0 1 0
1 0 0 1 0 1
0 0 1 0 1 0
The matrices represent vertice adjacency. If two vertices are adjacent, their pair gets 1. Although the graphs are not separated visually, the second graph starts after the 6th row of the first. Each graph can have a lot of vertices, like 5000 and they are both of the same size (the graphs). I wrote an algorithm that checks if the two graphs are isomorphic and i noticed that reading the graphs takes 8 seconds and the actual algorithm takes 2.5 (for 5000 vertices). Since my goal is to optimize the overall speed of my program, I want to know if i can improve (in terms of speed) my current code of reading from file:
FILE* file = fopen ("input.txt", "r");
fscanf (file, "%d", &i);
int n = i;
while (!feof (file))
{
fscanf (file, "%d", &i);
if (j < (n*n)) { // first graph
if (i==1) {
adj_1[j/n][v_rank_1[j/n]] = j - (j/n)*n; // add the vertice to the adjacents of the current vertice
v_rank_1[j/n] += 1;
}
}
else if (j>=(n*n)) { // second graph
if (i==1) {
adj_2[(j-(n*n))/n][v_rank_2[(j-(n*n))/n]] = (j-(n*n)) - ((j-(n*n))/n)*n; // add the vertice to the adjacents of the current vertice
v_rank_2[(j-(n*n))/n] += 1;
}
}
j++;
}
fclose (file);
The adj_*
table holds the indexes of the adjacent vertices of a vertice
The v_rank_*
table holds the number of vertices adjacent to a vertice
It is important that I acquire this and only this information from the graph.
Upvotes: 1
Views: 209
Reputation: 1842
The first optimization is to read the whole file in memory in one shot. Accessing memory in the loops will be faster than calling fread.
The second optimization is to do less arythmetic operations, even if it means more code.
Third optimization is treating the data from file as characters to avoid integer conversion.
The result could be:
// bulk read file into memory
fseek(file, 0, SEEK_END);
long fsize = ftell(file);
fseek(file, 0, SEEK_SET);
char *memFile = malloc(fsize + 1);
if (memFile == NULL) return; // not enough memory !! Handle it as you wish
fscanf(file, "%d", &n);
fread(memFile, fsize, 1, file);
fclose(file);
memfile[fsize] = 0;
// more code but less arythmetic operations
int lig, col;
char *mem = memFile, c;
for (int lig = 0; lig < n; lig++) { // first graph
for (int col = 0; col < n; col++) {
for (;;)
{
c = *mem;
if (c == 0) break;
mem++;
if (c == '1') {
adj_1[lig][v_rank_1[lig]++] = col; // add the vertice to the adjacents of the current vertice
k++; // ??
break;
}
if (c == '0') break;
}
}
}
for (int lig = 0; lig < n; lig++) { // second graph
for (int col = 0; col < n; col++) {
c = *mem;
if (c == 0) break;
mem++;
if (c == '1') {
adj_2[(lig][v_rank_2[lig]++] = col; // add the vertice to the adjacents of the current vertice
l++; // ??
break;
}
if (c == '0') break;
}
}
}
free(memFile);
Remarks: you said nothing about variables k
and l
.
Upvotes: 3
Reputation: 96966
Use fgets()
to read a line at a time into a line buffer. Parse the line buffer into integer values.
This function reduces the number of times you read from the file, because behind the scenes, fgets()
reads a large chunk of data from the file and returns a line at a time. It only attempts to read another chunk when there are no more lines left in its internal buffer.
Upvotes: 0
Reputation: 2211
You could speed it up by accessing the file system less often. You are reading one integer at a time from the file thus accessing the file every time through the loop.
Instead, try reading the whole file or a large chunk of the file at once. (This is called block reading). You can buffer it into an array. Inside your loop, read from the memory buffer instead of the file. Refresh your memory buffer as needed inside the loop if you don't read in the entire file.
Upvotes: 2