Reputation: 1757
In terms of I/O, I'd expect Python and C to have similar performance, but I'm seeing C being from 1.5 to 2 times faster than Python for a similar implementation.
The task is simple: concatenate thousands of ~250 bytes text files, each containing two lines:
Header1 \t Header2 \t ... HeaderN
float1 \t float2 \t ... floatN
The header is the same for all files, so it is read only once and the output file will look like:
Header1 \t Header2 \t ... HeaderN
float1 \t float2 \t ... floatN
float1 \t float2 \t ... floatN
float1 \t float2 \t ... floatN
... thousands of lines
float1 \t float2 \t ... floatN
Here is my implementation in C:
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <time.h>
#define LINE_SIZE 300
#define BUFFER_SZ 5000*LINE_SIZE
void combine(char *fname) {
DIR *d;
FILE * fp;
char line[LINE_SIZE];
char buffer[BUFFER_SZ];
short flagHeader = 1;
buffer[0] = '\0'; // need to init buffer befroe strcat to it
struct dirent *dir;
chdir("runs");
d = opendir(".");
if (d) {
while ((dir = readdir(d)) != NULL) {
if ((strstr(dir->d_name, "Hs")) && (strstr(dir->d_name, ".txt")) ) {
fp = fopen (dir->d_name, "r");
fgets(line, LINE_SIZE, fp); // read first line
if (flagHeader) { // append it to buffer only once
strcat(buffer, line);
flagHeader = 0;
}
fgets(line, LINE_SIZE, fp); // read second line
strcat(buffer, line);
fclose(fp);
}
}
closedir(d);
chdir("..");
fp = fopen(fname, "w");
fprintf(fp, buffer);
fclose(fp);
}
}
int main() {
clock_t tc;
int msec;
tc = clock();
combine("results_c.txt");
msec = (clock() - tc) * 1000 / CLOCKS_PER_SEC;
printf("elapsed time: %d.%ds\n", msec/1000, msec%1000);
return 0;
}
And in Python:
import glob
from time import time
def combine(wildcard, fname='results.txt'):
"""Concatenates all files matching a name pattern into one file.
Assumes that the files have 2 lines, the first one being the header.
"""
files = glob.glob(wildcard)
buffer = ''
flagHeader = True
for file in files:
with open(file, 'r') as pf:
lines = pf.readlines()
if not len(lines) == 2:
print('Error reading file %s. Skipping.' % file)
continue
if flagHeader:
buffer += lines[0]
flagHeader = False
buffer += lines[1]
with open(fname, 'w') as pf:
pf.write(buffer)
if __name__ == '__main__':
et = time()
combine('runs\\Hs*.txt')
et = time() - et
print("elapsed time: %.3fs" % et)
And a benchmark of 10 runs each - the files are in a local network drive in a busy office, so I guess that explains the variation:
Run 1/10
C elapsed time: 9.530s
Python elapsed time: 10.225s
===================
Run 2/10
C elapsed time: 5.378s
Python elapsed time: 10.613s
===================
Run 3/10
C elapsed time: 6.534s
Python elapsed time: 13.971s
===================
Run 4/10
C elapsed time: 5.927s
Python elapsed time: 14.181s
===================
Run 5/10
C elapsed time: 5.981s
Python elapsed time: 9.662s
===================
Run 6/10
C elapsed time: 4.658s
Python elapsed time: 9.757s
===================
Run 7/10
C elapsed time: 10.323s
Python elapsed time: 19.032s
===================
Run 8/10
C elapsed time: 8.236s
Python elapsed time: 18.800s
===================
Run 9/10
C elapsed time: 7.580s
Python elapsed time: 15.730s
===================
Run 10/10
C elapsed time: 9.465s
Python elapsed time: 20.532s
===================
Also, a profile run of the python implementation indeed says that 70% of the time is spent with io.open
, and the rest with readlines
.
In [2]: prun bc.combine('runs\\Hs*.txt')
64850 function calls (64847 primitive calls) in 12.205 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1899 8.391 0.004 8.417 0.004 {built-in method io.open}
1898 3.322 0.002 3.341 0.002 {method 'readlines' of '_io._IOBase' objects}
1 0.255 0.255 0.255 0.255 {built-in method nt.listdir}
Even if readlines
is extremely slower than fgets
, the time spent by python with io.open
only is larger than total runtime in C. And also, in the end, both readlines
and fgets
will read the file line by line, so I'd expect more comparable performance.
So, into my question: in this particular case, why is python so much slower than C for I/O?
Upvotes: 4
Views: 3779
Reputation: 26066
It boils down to a few things:
Most importantly, the Python version is using the text mode (i.e. r
and w
), which implies handling str
(UTF-8) objects instead of bytes
.
There are many small files and we do so little with them -- Python's own overhead (e.g. setting up the file objects in open
) becomes important.
Python has to dynamically allocate memory for most things.
Also note that I/O in this test is not that relevant if you use local files and do multiple runs, since they will be already cached in memory. The only real I/O will be the final write
(and even then, you would have to make sure you are flushing/syncing to disk).
Now, if you take care of the text mode (i.e. using rb
and wb
) and also you reduce the allocations (less important in this case, but also noticeable), you get something like this:
def combine():
flagHeader = True
with open('results-python-new.txt', 'wb') as fout:
for filename in glob.glob('runs/Hs*.txt'):
with open(filename, 'rb') as fin:
header = fin.readline()
values = fin.readline()
if flagHeader:
flagHeader = False
fout.write(header)
fout.write(values)
Then Python already finishes the tasks in half the time -- actually faster than the C version:
Old C: 0.234
Old Python: 0.389
New Python: 0.213
Possibly you can still improve the time a bit, e.g. by avoiding the glob
.
However, if you also apply a couple of similar modifications to the C version, then you will get a much better time -- a third of the time of Python's:
New C: 0.068
Take a look:
#define LINE_SIZE 300
void combine(void) {
DIR *d;
FILE *fin;
FILE *fout;
struct dirent *dir;
char headers[LINE_SIZE];
char values[LINE_SIZE];
short flagHeader = 1;
fout = fopen("results-c-new.txt", "wb");
chdir("runs");
d = opendir(".");
if (d) {
while ((dir = readdir(d)) != NULL) {
if ((strstr(dir->d_name, "Hs")) && (strstr(dir->d_name, ".txt")) ) {
fin = fopen(dir->d_name, "rb");
fgets(headers, LINE_SIZE, fin);
fgets(values, LINE_SIZE, fin);
if (flagHeader) {
flagHeader = 0;
fputs(headers, fout);
}
fputs(values, fout);
fclose(fin);
}
}
closedir(d);
fclose(fout);
}
}
Upvotes: 6