Pandas skiprows beyond 900000 fails

Question

My csv file contains 6Million records and I am trying to split it into multiple smaller sized files by using skiprows.. My Pandas version is '0.12.0' and the code is

pd.read_csv(TRAIN_FILE, chunksize=50000, header=None, skiprows=999999, nrows=100000)

It works as long as skiprows is less than 900000. Any idea if it is expected ? If I do not use skiprows, my nrows can go upto 5Million records. Have not yet tried beyond that. will try this also.

tried csv splitter, but it does not work properly for the first entry, may be, because, each cell consists of multiple lines of code etc.

EDIT: I am able to split it into csv by reading the entire 7GB file using pandas read_csv and writing in parts to multiple csv files.

Pandas skiprows beyond 900000 fails

Answers (1)

Related Questions