How to optimize recursive function calls and inner loops in pandas?

Question

I want to find all the parents of a particular child from a dataframe. My current code takes more than 20 seconds to compile for a 3000 datapoint dataset. I figured it is because of the recursive function calls and loops I have used. Can you help me optimise the program?

I've tried to search for the parent of the child node, print it and assume it as a child. Then recursively find it's parent and so on until all parents are found exhaustively.

df = pd.DataFrame(
    {
        'parent_name': 
    ["Car","Tyre","Tyre","Rubber","Nylon","Nylon","Trees","Trees"],
    'child_name': ["Tyre","Rubber","Nylon","Trees","Chemicals","Man-made","Leaves","Stems"]
    }
)

Define a function using all these to find all parent nodes

def get_parent_list(node_id):

    list_of_parents = []  

#define a function to find parent_names for all child_names   
    def find_parent(node_id):

       parent_names = df.loc[df["child_name"].isin([node_id]),"parent_name"]

       for parent_name in parent_names:
          list_of_parents.append(parent_name)
          find_parent(parent_name)

       find_parent(node_id)
       return list_of_parents

  df["list_of_parents"] = df["child_name"].apply(get_parent_list)

I would store the output received as a separate column on the dataframe

After this I would just do a search in the dataframe for user input and display the corresponding list of parents column as output

OutPut expected :

if user gives : "Trees" as input

output : Trees : Rubber, tyre, car

How to optimize recursive function calls and inner loops in pandas?

Define a function using all these to find all parent nodes

I would store the output received as a separate column on the dataframe

After this I would just do a search in the dataframe for user input and display the corresponding list of parents column as output

Answers (1)

Related Questions