Eric Pruitt
Eric Pruitt

Reputation: 1903

BufferedReader in Python 2.x vs Python 3.x

I have a program that runs in Python 2 and Python 3, but there is a drastic difference in speed. I understand a number of internal changes were made in the switch, but the difference in io.BufferedReader are really high. In both versions, I use io.BufferedReader because the main program loop only needs data one byte at a time. Here is an excerpt from the cProfile output for the script (see cumtime, not tottime):

Python 2:
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 36984   0.188    0.000    0.545    0.000   io.py:929(read)

Python 3:
 36996    0.063   0.000    0.063    0.000   {method 'read' of '_io.BufferedReader' objects}

When I print the object, both return something like io.BufferedReader so I am certain they are both using BufferedReader.

Here is the code in question. See line 28. The caller is responsible for setting up bufstream. I used bufstream = io.open('testfile', 'rb')

Why is there such a drastic difference in speed of BufferedReader for reading single bytes in the files, and how can I "fix" the issue for Python 2.x? I am running Python 2.6 and Python 3.1.

Upvotes: 2

Views: 2825

Answers (2)

John Machin
John Machin

Reputation: 82942

To give you a fuller answer, one would need to see your code (or, better, an executable precis of your code).

However a partial answer can be gleaned from your profile output: io.py suggests that "Python 2" (for avoidance of doubt, give the actual version numbers) is implementing BufferedReader in Python, whereas _io.BufferedReader suggests that "Python3" is implementing it in C.

Late-breaking news: Python 2.6's io.py is over 64Kb and includes the following comment up the front :

# This is a prototype; hopefully eventually some of this will be
# reimplemented in C.

Python 2.7's io.py is about 4Kb and appears to be a thin wrapper of an _io module.

If you want real assistance with a workaround for 2.6, show your code.

Probable workaround for Python 2.6

Instead of:

test = io.open('test.bmp', 'rb')

do this:

test = open('test.bmp', 'rb')

Some rough timing figures, including the missing link (Python 2.7):

Windows 7 Pro, 32-bit, approx 5 Mb file, guts of code is:

while 1:
    c = f.read(1)
    if not c: break

2.6: io.open 20.4s, open 5.1s
2.7: io.open  3.3s, open 4.8s # io.open is better
3.1: io.open  3.6s, open 3.6s # effectively same code is used

So a better story seems to be this: In general, don't faff about with io.open unless you have good reason to e.g. you want 2.7 to go faster.

Upvotes: 6

Kabie
Kabie

Reputation: 10663

Using 2.7 should solve this. See PEP 3116 and Python 2.7 doc.

A part of module io is written in python in 2.6, while in 2.7+ the whole module is written in C

Upvotes: 4

Related Questions