Reputation: 1221
I'm running into a problem while trying to load large files using Python 3.5. Using read()
with no arguments sometimes gave an OSError: Invalid argument
. I then tried reading only part of the file and it seemed to work fine. I've determined that it starts to fail somewhere around 2.2GB
, below is the example code:
>>> sys.version
'3.5.1 (v3.5.1:37a07cee5969, Dec 5 2015, 21:12:44) \n[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
I also noticed that this does not happen in Python 2.7. Here is the same code run in Python 2.7:
>>> sys.version
'2.7.10 (default, Aug 22 2015, 20:33:39) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.1)]'
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.1*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read(int(2.2*10**9))
>>> x = open('/Users/username/Desktop/large.txt', 'r').read()
>>>
I am using OS X El Capitan 10.11.1.
Is this a bug or should use another method for reading the files?
Upvotes: 11
Views: 7847
Reputation: 160467
Yes, you have bumped into a bug.
Good news is that someone else has also found it and already created an issue for it in the Python bug tracker, see: Issue24658 - open().write()
fails on 2 GB+ data (OS X). This, seems, is platform depended (OS-X only) and is reproducible when using read
and/or write
. Apparently an issue exists with the way fread.c
is implemented in the libc implementation for OS-X see here.
Bad News is that it is still open (and, currently, inactive) so, you'll have to wait until it is resolved. Either way, you can still take a look at the discussion there if you're interested for the specifics.
As a solution, I'm pretty sure you can side-step the issue until it is fixed by reading in chunks and chaining the chunks during processing. Do the same when writing. Unfortunate but, it might do the trick.
Upvotes: 6