Share same Pandas Dataframe between process Pool without copying again and again

Question

I have a dataframe which holds a query result for about 1 million or more

When i pass this to map function which performs comparison of two dataframes , the above mentioned dataframe gets copied for every process and gives me memory error.

Sample code

df = pd.read_sql_query('Query returning 1 million or more rows')

def comparison(df):
   
# Having comparison logic which uses the df object mentioned above

p = Pool(2)
fn = partial(comparison,df)
p.map(fn,'some iterator')

now what i want is on mapping comparison function to different processes , it shoul not copy the df again and again

I have tried moving the query fetching part i.e the df inside the compariosn function , it works but gets executed again and again for each iterator object , since the query takes 40 - 50 seconds to execute , this is a time overhead everytime . Therfore i only wat to do it once and use it everytime

Share same Pandas Dataframe between process Pool without copying again and again

Answers (1)

Related Questions