Merging two DataFrames in chunks

Question

Objective

What would be the best approach to merge df1 and df2 together where df2 is merged in by chunks. Currently getting memory error when merging in df2

Without chunks I do the following:

df = df1.merge(df2, how='left', left_on=['x','y'], right_on['x','y']

Mayank Porwal · Accepted Answer

You can split the large dataframe in chunks of let's say 200K rows.

n = 200000  #chunk row size
list_df = [df2[i:i+n] for i in range(0, df2.shape[0],n)]

Then merge all the chunked df's with df1:

res = pd.DataFrame() 

for chunk in list_df:
    res = pd.concat([res, df1.merge(chunk, how='left', left_on=['x','y'], right_on['x','y'])

Merging two DataFrames in chunks

Answers (2)

Related Questions