Eben
Eben

Reputation: 96

Avoiding OSError: [Errno 22] Invalid argument when read()ing large file

I'm trying to read large .txt files in their entirety into memory (one at a time) to pick random lines until certain conditions are met. I can't use readlines() or linecache.getline() or similar because the file's lines are delimited by \n\n instead of \n. (Splitting on \n results in weird half-sentences, etc.) Ideally I don't want to split the data into chunks either, to avoid oversampling from a particular part of the file. Currently when I try to load the file into memory and separate along the file's delimiter using read().split('\n\n') the program crashes with

OSError: [Errno 22] Invalid argument

Am I just out of luck here, or is there a workaround? RAM is not an issue.

EDIT: I just tried loading the file into memory using Python 2.7.10 and the same read().split('\n\n'), which works fine with no error. So I suppose my question should be more specific: is there a workaround for Python 3+?

EDIT2, per Ivan's insistence: You can replicate my issue using the following code

with open('file_larger_than_2gb.txt', 'r') as f:
    source = f.read().split('\n\n')

which works fine with Python 2 and triggers OSError with Python 3

Upvotes: 2

Views: 1592

Answers (1)

Eben
Eben

Reputation: 96

This issue was caused by a bug in beta of macOS 10.13.6. Problem is fixed in the full release of 10.13.6, released on July 10, 2018.

Upvotes: 1

Related Questions