Maxim Veksler
Maxim Veksler

Reputation: 30182

How to read lines from a mmapped file?

Is seems that the mmap interface only supports readline(). If I try to iterate over the object I get character instead of complete lines.

What would be the "pythonic" method of reading a mmap'ed file line by line?

import sys
import mmap
import os


if (len(sys.argv) > 1):
  STAT_FILE=sys.argv[1]
  print STAT_FILE
else:
  print "Need to know <statistics file name path>"
  sys.exit(1)


with open(STAT_FILE, "r") as f:
  map = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
  for line in map:
    print line # RETURNS single characters instead of whole line

Upvotes: 27

Views: 31724

Answers (5)

Minakshi Boruah
Minakshi Boruah

Reputation: 5

Even better in case you get error with mmap():

with open('/content/drive/MyDrive......', "r+b") as f:
    # map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) as mmap not recogn. import something
    for line in iter(f.readline, b""):
      print(line)

Upvotes: -1

Sven Marnach
Sven Marnach

Reputation: 601659

The most concise way to iterate over the lines of an mmap is

with open(STAT_FILE, "r+b") as f:
    map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    for line in iter(map_file.readline, b""):
        # whatever

Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").

Upvotes: 39

Richard Aplin
Richard Aplin

Reputation: 109

Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:

On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):

with open(someFile,"r") as f:
    if usemmap:
        m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    else:
        m=f
        e.parse(m)

With MMAP:

read in 0.308000087738

Without MMAP:

read in 0.680999994278

Upvotes: 0

hochl
hochl

Reputation: 12930

I modified your example like this:

with open(STAT_FILE, "r+b") as f:
        m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
        while True:
                line=m.readline()
                if line == '': break
                print line.rstrip()

Suggestions:

Hope this helps.

Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.

Normal file access:

real    2.410 2.414 2.428 2.478 2.490
sys     0.052 0.052 0.064 0.080 0.152
user    2.232 2.276 2.292 2.304 2.320

mmap file access:

real    1.885 1.899 1.925 1.940 1.954
sys     0.088 0.108 0.108 0.116 0.120
user    1.696 1.732 1.736 1.744 1.752

Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.

Edit 2: Using python -m cProfile test.py I got the following results:

5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
5432833    1.451    0.000    1.451    0.000 {method 'readline' of 'mmap.mmap' objects}

If I'm not mistaken then mmap is quite a bit faster.

Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.

Upvotes: 15

NPE
NPE

Reputation: 500357

The following is reasonably concise:

with open(STAT_FILE, "r") as f:
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
        line = m.readline()  
        if line == "": break
        print line
    m.close()

Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "\n").

The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.

I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().

Upvotes: 1

Related Questions