How to parallelize a python code that has two different pandas dataframes?

Question

I have two dataframes and have a code to extract some data from one of the dataframes and add to the other dataframe:

sales= pd.read_excel("data.xlsx", sheet_name = 'sales', header = 0)
born= pd.read_excel("data.xlsx", sheet_name = 'born', header = 0)

bornuni = born.number.unique()
for babies in bornuni:
    datafame = born[born["id"]==number]
    for i, r in sales.iterrows():
        if r["number"] == babies:
            sales.loc[i,'ini_weight'] = datafame["weight"].iloc[0]
            sales.loc[i,'ini_date'] = datafame["date of birth"].iloc[0]
        else:
            pass

this is pretty inefficient with bigger data sets so I want to parallelize this code but I don´t have a clue how to do it. Any help would be great. Here is a link to a mock dataset.

How to parallelize a python code that has two different pandas dataframes?

Answers (1)

Related Questions