Reputation: 3249
I'm profiling node.js vs python in file (48KB) reading synchronously.
Node.js code
var fs = require('fs');
var stime = new Date().getTime() / 1000;
for (var i=0; i<1000; i++){
var content = fs.readFileSync('npm-debug.log');
}
console.log("Total time took is: " + ((new Date().getTime() / 1000) - stime));
Python Code
import time
stime = time.time()
for i in range(1000):
with open('npm-debug.log', mode='r') as infile:
ax = infile.read();
print("Total time is: " + str(time.time() - stime));
Timings are as follows:
$ python test.py
Total time is: 0.5195660591125488
$ node test.js
Total time took is: 0.25799989700317383
Where is the difference?
Or Am I not comparing apples to apples?
EDIT:
PURPOSE:
To understand the truth in node.js is slower than python is slower than C kind of things and if so slow at which place in this context.
Upvotes: 2
Views: 3159
Reputation: 44926
readlines
returns a list of lines in the file, so it has to read the data char by char, constantly comparing the current character to any of the newline characters, and keep composing a list of lines.
This is more complicated than simple file.read()
, which would be the equivalent of what Node.js does.
Also, the length calculated by your Python script is the number of lines, while Node.js gets the number of characters.
If you want even more speed, use os.open
instead of open
:
import os, time
def Test_os(n):
for x in range(n):
f = os.open('Speed test.py', os.O_RDONLY)
data = ""
t = os.read(f, 1048576).decode('utf8')
while t:
data += t
t = os.read(f, 1048576).decode('utf8')
os.close(f)
def Test_open(n):
for x in range(n):
with open('Speed test.py') as f:
data = f.read()
s = time.monotonic()
Test_os(500000)
print(time.monotonic() - s)
s = time.monotonic()
Test_open(500000)
print(time.monotonic() - s)
On my machine os.open
is several seconds faster than open
. The output is as follows:
53.68909174999999
58.12600833400029
As you can see, open
is 4.4
seconds slower than os.open
, although as the number of runs decreases, so does this difference.
Also, you should try tweaking the buffer size of the os.read
function as different values may give very different timings:
Here 'operation' means a single call to Test_os
.
If you get rid of bytes' decoding and use io.BytesIO
instead of mere bytes
objects, you'll get a considerable speedup:
def Test_os(n, buf):
for x in range(n):
f = os.open('test.txt', os.O_RDONLY)
data = io.BytesIO()
while data.write(os.read(f, buf)):
...
os.close(f)
Thus, the best result is now 0.038
seconds per call instead of 0.052
(~37% speedup).
Upvotes: 7