Reputation: 2314
I currently have several python pandas scripts that I keep separate because of 1) readability, and 2) sometimes I am interested in the output of these partial individual scripts.
However, generally, the CSV file output of one of these scripts is the CSV input of the next and in each I have to re-read datetimes which is inconvenient.
What best practices do you suggest for this task? Is it better to just combine all the scripts into one for when I'm interested in running the whole program or is there a more Python/Pandas way to deal with this?
thank you and I appreciate all your comments,
Upvotes: 0
Views: 155
Reputation: 71
If I understand your question well, using modules would be the best approach to me.
You can keep your scripts separated and import them as modules when needed in a dependent script. For example:
Script 1:
import pandas
def create_pandas_dataframe():
# Creating a dataframe ...
df = pandas.DataFrame()
return df
def run():
# Run the script 1
df = create_pandas_dataframe()
# Here, call other functions specific to this script
if __name__ == '__main__':
# Run the script
run()
Script 2:
from script_1 import create_pandas_dataframe
def use_pandas_dataframe(a_df):
print a_df
if __name__ == '__main__':
df = create_pandas_dataframe()
use_pandas_dataframe(df)
This way, you can directly use the output of an existing function as input for another one without them being in the same script.
Upvotes: 1
Reputation: 712
Instead of writing a CSV output which you have to re-parse, you can write and read the pandas.DataFrame
in efficient binary format with the methods pandas.DataFrame.to_pickle()
and pandas.read_pickle()
, respectively.
Upvotes: 1