Should I use classes for pandas.DataFrame?

Question

I have more of a general question. I've written a couple of functions that transform data successively:

def func1(df):
    pass

...


def main():

    df = pd.read_csv()

    df1 = func1(df)
    df2 = func2(df1)
    df3 = func3(df2)
    df4 = func4(df3) 
    df4.to_csv()


if __name__ == "__main__":
    main()

Is there a better way of organizing the logic of my script?

Should I use classes for cases like this when everything is tied to one dataset?

Vee · Accepted Answer

It depends of your usecase. For what I understand, I would use dictionary of your functions that process a df. For instance:

function_returning_a_df = { "f1": func1, "f2": func2, "f3": func3}
df = pd.read_csv(csv)

if this df needs 3 functions to be applied

df_processing = ["f1","f2","f3"] #function will be applied in this order

# If you need to keep df at every step you can make a list
dfs_processed = []

for func in df_processing:
  dfs_processed.append(df) # if you want to save all steps
  df = function_returning_a_df[func](df)

Should I use classes for pandas.DataFrame?

Answers (1)

Related Questions