Best way to split a huge file in Python

Question

I need to split a very large file (3GB) ten times in the following way: the first split splits between the first 10% of the lines and the rest of the file, the second split splits between the second 10% of the lines and the rest and so on (This is in order to do cross validation)

I've done this naively by loading the lines of the file to a list, going through the list and writing each line to the right output file by its index. This takes too long since it writes 3GB of data each time.

Is there a better way to do so?

Note: adding # to the start of each line is like deleting it. Would it be smarter to add and remove # to the start of the lines at the start?

EXAMPLE: if the file is [1,2,3,4,5,6,7,8,9,10] then I want to split it like that:

[1] and [2,3,4,5,6,7,8,9,10]
[2] and [1,3,4,5,6,7,8,9,10]
[3] and [1,2,4,5,6,7,8,9,10]

and so on

Best way to split a huge file in Python

Answers (1)

Related Questions