Reputation: 22684
This is a question from a lazy man.
I have 4 million rows of pandas DataFrame and would like to save them into smaller chunks of pickle files.
Why smaller chunks? To save/load them quicker.
My question is: 1) Is there a better way (in-built function) to save them in smaller pieces than manually chunking them using np.array_split?
2) Is there any graceful way of gluing them together when I read the chunks other than manually gluing them together?
Please Feel free to suggest any other data type suited for this job other than pickle.
Upvotes: 3
Views: 2721
Reputation: 294488
I've been using this for a dataframe of size 7,000,000 x 250
Use hdfs DOCUMENTATION
df = pd.DataFrame(np.random.rand(5, 5))
df
df.to_hdf('myrandomstore.h5', 'this_df', append=False, complib='blosc', complevel=9)
new_df = pd.read_hdf('myrandomstore.h5', 'this_df')
new_df
Upvotes: 3