James Bond
James Bond

Reputation: 7913

is Pandas concat an in-place function?

I guess this question needs some insight into the implementation of concat.

Say, I have 30 files, 1G each, and I can only use up to 32 G memory. I loaded the files into a list of DataFrames, called 'list_of_pieces'. This list_of_pieces should be ~ 30G in size, right?

if I do pd.concat(list_of_pieces), does concat allocate another 30G (or maybe 10G 15G) in the heap and do some operations, or it run the concatation 'in-place' without allocating new memory?

anyone knows this?

Thanks!

Upvotes: 19

Views: 20761

Answers (2)

Try this:

dfs = [df1, df2]

temp = pd.concat(dfs, copy=False, ignore_index=False)
    
df1.drop(df1.index[0:], inplace=True)

df1[temp.columns] = temp 

Upvotes: 2

Jeff
Jeff

Reputation: 129018

The answer is no, this is not an in-place operation; np.concatenate is used under the hood, see here: Concatenate Numpy arrays without copying

A better approach to the problem is to write each of these pieces to an HDFStore table, see here: http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables for docs, and here: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore for some recipies.

Then you can select whatever portions (or even the whole set) as needed (by query or even row number)

Certain types of operations can even be done when the data is on-disk: https://github.com/pydata/pandas/issues/3202?source=cc, and here: http://pytables.github.io/usersguide/libref/expr_class.html#

Upvotes: 16

Related Questions