Reputation: 96
I'm trying to read large .txt files in their entirety into memory (one at a time) to pick random lines until certain conditions are met. I can't use readlines()
or linecache.getline()
or similar because the file's lines are delimited by \n\n
instead of \n
. (Splitting on \n
results in weird half-sentences, etc.) Ideally I don't want to split the data into chunks either, to avoid oversampling from a particular part of the file. Currently when I try to load the file into memory and separate along the file's delimiter using read().split('\n\n')
the program crashes with
OSError: [Errno 22] Invalid argument
Am I just out of luck here, or is there a workaround? RAM is not an issue.
EDIT: I just tried loading the file into memory using Python 2.7.10 and the same read().split('\n\n')
, which works fine with no error. So I suppose my question should be more specific: is there a workaround for Python 3+?
EDIT2, per Ivan's insistence: You can replicate my issue using the following code
with open('file_larger_than_2gb.txt', 'r') as f:
source = f.read().split('\n\n')
which works fine with Python 2 and triggers OSError with Python 3
Upvotes: 2
Views: 1592
Reputation: 96
This issue was caused by a bug in beta of macOS 10.13.6. Problem is fixed in the full release of 10.13.6, released on July 10, 2018.
Upvotes: 1