Merging many pickle Dataframes into one

Question

I have 600 Dataframes saved and stored as .pickle and I'd like to merge (or rather append) them into one DataFrame. The total size of them is 10GB.

When I read each of them and append them into one big DataFrame and then save the full version to dist the entire process takes 2 hours on 16GB machine.

I think it takes a lot of time because each time I append a new DataFrame system allocates new memory space for the entire new DataFrame?

How can I do this faster?

Celius Stingher · Accepted Answer

Rather than append them one by one, I suggest you use pd.concat() and pass all the dataframes in a go.

Output = pd.concat([pd.read_pickle(r'location/'+x) for x in os.listdir('location')])

We can create the list of dataframes using list comprehensions, supposed this pickle files are saved in the same folder, and use pd.concat to concatenate them all in a single dataframe.

Merging many pickle Dataframes into one

Answers (2)

Related Questions