chunksize isn't starting from first row in csv file

Question

Using Python 3.

I have a very large CSV file that I need to split and save to_csv. I use chunksize parameter to determine how many rows I need in both files. Expectation is the first code should read required rows so I can save it into first CSV file and the second should take care of remaining rows so I can save them in second CSV file:

As an example, let's say file is 3000 rows and using below code :

file = pd.read_csv(r'file.csv',index_col=None, header='infer', encoding='ISO-8859-1',skiprows=None, chunksize=500)

I've used skiprows=None as I want it to start from the beginning and chunk the first 500.

Then, second code should skip previous 500 and chunk remaining:

file = pd.read_csv(r'file.csv',index_col=None, header='infer', encoding='ISO-8859-1',skiprows=500, chunksize=2500)

However, the result I get from first code is that it always goes directly and chunk the last 500 and not starting from beginning as expected. It doesn't sound that skiprows is working as expected if chunksize will always skip to the last given number.

Would appreciate any kind of suggestion on what might be going on here.

JohnE · Accepted Answer

It sounds like you don't really need chunksize at all if I understand what you are trying to do. Here's code that reads the first 500 lines into df1 and the rest into df2, and then combines into a single dataframe, in case you want to do that also.

rows = 500

df1 = pd.read_csv( 'test.csv', nrows   =rows )
df2 = pd.read_csv( 'test.csv', skiprows=rows+1, names=df1.columns )

df3 = pd.concat( [df1,df2] ).reset_index(drop=True)

If you just want to read the original file and output 2 new csv files without creating any intermediate dataframes, perhaps this is what you want?

names = pd.read_csv( 'test.csv', nrows = 2 ).columns   # store column names

pd.read_csv( 'test.csv', nrows    = rows                ).to_csv('foo1.csv')
pd.read_csv( 'test.csv', skiprows = rows+1, names=names ).to_csv('foo2.csv')

chunksize isn't starting from first row in csv file

Answers (2)

Related Questions

chunksize isn&#39;t starting from first row in csv file

Answers (2)

Related Questions

chunksize isn't starting from first row in csv file