Pickle dump Pandas DataFrame

Question

This is a question from a lazy man.

I have 4 million rows of pandas DataFrame and would like to save them into smaller chunks of pickle files.

Why smaller chunks? To save/load them quicker.

My question is: 1) Is there a better way (in-built function) to save them in smaller pieces than manually chunking them using np.array_split?

2) Is there any graceful way of gluing them together when I read the chunks other than manually gluing them together?

Please Feel free to suggest any other data type suited for this job other than pickle.

piRSquared · Accepted Answer

I've been using this for a dataframe of size 7,000,000 x 250

Use hdfs DOCUMENTATION

df = pd.DataFrame(np.random.rand(5, 5))
df

df.to_hdf('myrandomstore.h5', 'this_df', append=False, complib='blosc', complevel=9)

new_df = pd.read_hdf('myrandomstore.h5', 'this_df')
new_df

Answers (2)