applying the same operation to different dataframes

Question

couldnt find anything about this is python... ive been working on three different datasets for some machine learning projects and it has been a bit of an arduous task typing out the same commands in exactly the same way for the same operation on a different data frame... it started with this:

aviva =  pd.read_csv('data/LON_AV_.csv', parse_dates=['Date'], index_col='Date', date_parser=dateparse )
admiral= pd.read_csv('data/LON_ADM.csv', parse_dates=['Date'], index_col='Date', date_parser=dateparse )
three =  pd.read_csv('data/LON_III.csv', parse_dates=['Date'], index_col='Date', date_parser=dateparse )

in the middle there were many commands in applied to each dataframe that were the same

and ended with this:

three.to_csv('three_x.csv')
three_label.to_csv('three_y.csv')

admiral.to_csv('admiral_x.csv')
admiral_label.to_csv('admiral_y.csv')

aviva.to_csv('aviva_x.csv')
aviva_label.to_csv('aviva_y.csv')

my question to you guys is this, is there any way you have of speeding up the process so that I do not have to be repeating code like this all the time? thank you guys and have a good day

MaxU - stand with Ukraine · Accepted Answer

i would do it this way:

in_csv_template = 'data/LON_{}.csv'
out_csv_template = 'out/{}_x.csv'
out_label_template = 'out/{}_y.csv'

cfg = {
    'aviva': 'AV_',
    'admiral': 'ADM',
    'three': 'III',
}

def process(fi_csv, fo_csv, fo_label, **kwargs):
    df = pd.read_csv(fi_csv, **kwargs)

    ...

    df.to_csv(fo_csv)
    df_label.to_csv(fo_label)


for k,v in cfg.items():
    process(in_csv_template.format(v),
            out_csv_template.format(k),
            out_label_template.format(k),
            parse_dates=['Date'],
            index_col='Date',
            date_parser=dateparse)

applying the same operation to different dataframes

Answers (2)

Related Questions