How to write a pandas DataFrame to a CSV by fixed size chunks

Question

I need to output data from pandas into CSV files to interact with a 3rd party developed process.

The process requires that I pass it no more than 100,000 records in a file, or it will cause issues (slowness, perhaps a crash).

That said, how can I write something that takes a dataframe in pandas and splits it into 100,000 records frames? Nothing would be different other than the exported dataframes would be subsets of the parent dataframe.

I assume I could do a loop with something like this, but I assume it would be remarkably inefficient..

First, taking recordcount=len(df.index) to get the number of records and then looping until I get there using something like

df1 = df[currentrecord:currentrecord+100000,]

And then exporting that to a CSV file

There has to be an easier way.

Alexander Fridman · Accepted Answer

You can try smth like this:

def save_df(df, chunk_size=100000):
    df_size=len(df)
    for i, start in enumerate(range(0, df_size, chunk_size)):
        df[start:start+chunk_size].to_csv('df_name_{}.csv'.format(i))

How to write a pandas DataFrame to a CSV by fixed size chunks

Answers (2)

Related Questions