How to write a pandas DataFrame to a CSV by fixed size chunks

I need to output data from pandas into CSV files to interact with a 3rd party developed process.

The process requires that I pass it no more than 100,000 records in a file, or it will cause issues (slowness, perhaps a crash).

That said, how can I write something that takes a dataframe in pandas and splits it into 100,000 records frames? Nothing would be different other than the exported dataframes would be subsets of the parent dataframe.

I assume I could do a loop with something like this, but I assume it would be remarkably inefficient..

First, taking recordcount=len(df.index) to get the number of records and then looping until I get there using something like

df1 = df[currentrecord:currentrecord+100000,]

And then exporting that to a CSV file

There has to be an easier way.

Upvotes: 2

Views: 2533

Answers (2)

Alexander Fridman
Alexander Fridman

Reputation: 316

You can try smth like this:

def save_df(df, chunk_size=100000):
    df_size=len(df)
    for i, start in enumerate(range(0, df_size, chunk_size)):
        df[start:start+chunk_size].to_csv('df_name_{}.csv'.format(i))

Upvotes: 1

Borja
Borja

Reputation: 1471

You could add a column with a group, and then use the function groupby:

df1['Dummy'] = [a for b in zip(*[range(N)] * 100000) for a in b][:len(df1)]

Where N is set to a value large enough, the minimum being:

N = int(np.ceil(df1.len() / 100000))

Then group by that column and apply function to_csv():

def save_group(df):
    df.drop('Dummy', axis=1).to_csv("Dataframe_" + str(df['Dummy'].iloc[0]) + ".csv")

df1.groupby('Dummy').apply(save_group)

Upvotes: 1

Related Questions