dleal
dleal

Reputation: 2314

Suggestions to handle multiple python pandas scripts

I currently have several python pandas scripts that I keep separate because of 1) readability, and 2) sometimes I am interested in the output of these partial individual scripts.

However, generally, the CSV file output of one of these scripts is the CSV input of the next and in each I have to re-read datetimes which is inconvenient.

What best practices do you suggest for this task? Is it better to just combine all the scripts into one for when I'm interested in running the whole program or is there a more Python/Pandas way to deal with this?

thank you and I appreciate all your comments,

Upvotes: 0

Views: 155

Answers (2)

F. Moïni
F. Moïni

Reputation: 71

If I understand your question well, using modules would be the best approach to me.

You can keep your scripts separated and import them as modules when needed in a dependent script. For example:

Script 1:

import pandas

def create_pandas_dataframe():
    # Creating a dataframe ...
    df = pandas.DataFrame()
    return df

def run():
    # Run the script 1
    df = create_pandas_dataframe()
    # Here, call other functions specific to this script

if __name__ == '__main__':
    # Run the script
    run()

Script 2:

from script_1 import create_pandas_dataframe

def use_pandas_dataframe(a_df):
    print a_df

if __name__ == '__main__':
    df = create_pandas_dataframe()
    use_pandas_dataframe(df)

This way, you can directly use the output of an existing function as input for another one without them being in the same script.

Upvotes: 1

Pierre Schroeder
Pierre Schroeder

Reputation: 712

Instead of writing a CSV output which you have to re-parse, you can write and read the pandas.DataFrame in efficient binary format with the methods pandas.DataFrame.to_pickle() and pandas.read_pickle(), respectively.

Upvotes: 1

Related Questions