big_soapy
big_soapy

Reputation: 137

Carrying resulting dataframes through different functions

I have 8 functions that I would like to run under one main() function. The process starts with importing from a file and creating a df and then doing some cleaning operations on that df under a new function. I have copied in the basic structure including the three starting functions and then a main() function. What I am unsure about is how to 'carry' the result of loader() to clean_data() and then the result of clean_data() to operation_one() in the right way. At the moment I get an error that df is not defined. Thank you for your help!

def loader():

    import pandas as pd
    import numpy as np

    df = pd.read_excel('file_example.xlsx')
    return df


def clean_data():
        
    del df['column_7']
    return df


def operation_one():
        
    del df['column_12']
    return df


def main():

    loader()
    clean_data()
    operation_one()

    with pd.ExcelWriter(file.xlsx") as writer:
        df.to_excel(writer, sheet_name='test' , index=False)

if __name__ == "__main__":
    main()

Upvotes: 0

Views: 93

Answers (2)

Grave
Grave

Reputation: 68

So your main function just tells the other functions to run. Functions have their own variables that are kept within the function that defines them. So when def loader() runs is returns the value of df to the line that ran the function, within def main(): To store that value in the main function just put df = loader() in the main function. And when you call the new functions you need to pass this value into them for them to preform on the value of df. So when you call the next function in your main function, add df to the input field. clean_data(df). Then your clean data function will take in the value of df. You now need to redefine your def clean_data(): to take a variable like this, def clean_data(df):

This is what I have a bit cleaned up,

import pandas as pd
import numpy as np

def loader():
    df = pd.read_excel('file_example.xlsx')
    return df


def clean_data(df):
    del df['column_7']
    return df


def operation_one(df):
    del df['column_12']
    return df


def main():
    df = loader()
    df = clean_data(df)
    df = operation_one(df)

    with pd.ExcelWriter("file.xlsx") as writer:
        df.to_excel(writer, sheet_name='test', index=False)

    if __name__ == "__main__":
        main()

I hope this was somewhat helpful as it is my first question answered here.

Upvotes: 1

Matthew Borish
Matthew Borish

Reputation: 3086

You need to make sure to assign variables for the function return values. That is how you "carry" the result. You also need to pass in those variables as function arguments as you proceed. Adding a function parameter for the filename in loader() rather than hardcoding the file in the function is probably something you'll want to think about too.

import pandas as pd
import numpy as np


def loader():

    df = pd.read_excel('file_example.xlsx')
    return df


def clean_data(df):
        
    del df['column_7']
    return df


def operation_one(df):
        
    del df['column_12']
    return df


def main():

    df = loader()
    df = clean_data(df)
    df = operation_one(df)
   

    with pd.ExcelWriter("file.xlsx") as writer:
        df.to_excel(writer, sheet_name='test' , index=False)

if __name__ == "__main__":
    main()

Upvotes: 1

Related Questions