Reputation: 76899
I need to run a regex match over a file, but I'm faced with an unexpected problem: the file is too big to read()
or mmap()
in one call, File objects don't support the buffer()
interface, and the regex module takes only strings or buffers.
Is there an easy way to do this?
Upvotes: 1
Views: 1128
Reputation: 992757
The Python mmap
module provides a nice Python-friendly way of memory mapping a file. On a 32-bit operating system, the maximum size of the file is will be limited to no more than a GB or maybe two, but on a 64-bit OS you will be able to memory map a file of arbitrary size (until storage sizes exceed 264, of course).
I've done this with files of up to 30 GB (the Wikipedia XML dump file) in Python with excellent results.
Upvotes: 6