manikawnth
manikawnth

Reputation: 3249

Why is node.js faster than python in file reading?

I'm profiling node.js vs python in file (48KB) reading synchronously.

Node.js code

var fs = require('fs');
var stime = new Date().getTime() / 1000;

for (var i=0; i<1000; i++){
  var content = fs.readFileSync('npm-debug.log');
}

console.log("Total time took is: " + ((new Date().getTime() / 1000) - stime));

Python Code

import time
stime = time.time()
for i in range(1000):
    with open('npm-debug.log', mode='r') as infile:
        ax = infile.read();

print("Total time is: " + str(time.time() - stime));

Timings are as follows:

$ python test.py
Total time is: 0.5195660591125488

$ node test.js
Total time took is: 0.25799989700317383

Where is the difference?

  1. In File IO or
  2. Python list ds allocation

Or Am I not comparing apples to apples?

EDIT:

  1. Updated python's readlines() to read() for a good comparison
  2. Changed the iterations to 1000 from 500

PURPOSE:

To understand the truth in node.js is slower than python is slower than C kind of things and if so slow at which place in this context.

Upvotes: 2

Views: 3159

Answers (1)

ForceBru
ForceBru

Reputation: 44926

readlines returns a list of lines in the file, so it has to read the data char by char, constantly comparing the current character to any of the newline characters, and keep composing a list of lines.

This is more complicated than simple file.read(), which would be the equivalent of what Node.js does.

Also, the length calculated by your Python script is the number of lines, while Node.js gets the number of characters.


If you want even more speed, use os.open instead of open:

import os, time


def Test_os(n):
    for x in range(n):
        f = os.open('Speed test.py', os.O_RDONLY)
        data = ""
        t = os.read(f, 1048576).decode('utf8')
        while t:
            data += t
            t = os.read(f, 1048576).decode('utf8')
        os.close(f)

def Test_open(n):
    for x in range(n):
        with open('Speed test.py') as f:
            data = f.read()

s = time.monotonic()
Test_os(500000)
print(time.monotonic() - s)

s = time.monotonic()
Test_open(500000)
print(time.monotonic() - s)

On my machine os.open is several seconds faster than open. The output is as follows:

53.68909174999999
58.12600833400029

As you can see, open is 4.4 seconds slower than os.open, although as the number of runs decreases, so does this difference.

Also, you should try tweaking the buffer size of the os.read function as different values may give very different timings:

timings

Here 'operation' means a single call to Test_os.


If you get rid of bytes' decoding and use io.BytesIO instead of mere bytes objects, you'll get a considerable speedup:

def Test_os(n, buf):
    for x in range(n):
        f = os.open('test.txt', os.O_RDONLY)
        data = io.BytesIO()
        while data.write(os.read(f, buf)):
            ...
        os.close(f)

speedup

Thus, the best result is now 0.038 seconds per call instead of 0.052 (~37% speedup).

Upvotes: 7

Related Questions