salezica
salezica

Reputation: 76899

Python: regex on big file. Easy way?

I need to run a regex match over a file, but I'm faced with an unexpected problem: the file is too big to read() or mmap() in one call, File objects don't support the buffer() interface, and the regex module takes only strings or buffers.

Is there an easy way to do this?

Upvotes: 1

Views: 1128

Answers (1)

Greg Hewgill
Greg Hewgill

Reputation: 992757

The Python mmap module provides a nice Python-friendly way of memory mapping a file. On a 32-bit operating system, the maximum size of the file is will be limited to no more than a GB or maybe two, but on a 64-bit OS you will be able to memory map a file of arbitrary size (until storage sizes exceed 264, of course).

I've done this with files of up to 30 GB (the Wikipedia XML dump file) in Python with excellent results.

Upvotes: 6

Related Questions