entercaspa
entercaspa

Reputation: 704

applying the same operation to different dataframes

couldnt find anything about this is python... ive been working on three different datasets for some machine learning projects and it has been a bit of an arduous task typing out the same commands in exactly the same way for the same operation on a different data frame... it started with this:

aviva =  pd.read_csv('data/LON_AV_.csv', parse_dates=['Date'], index_col='Date', date_parser=dateparse )
admiral= pd.read_csv('data/LON_ADM.csv', parse_dates=['Date'], index_col='Date', date_parser=dateparse )
three =  pd.read_csv('data/LON_III.csv', parse_dates=['Date'], index_col='Date', date_parser=dateparse )

in the middle there were many commands in applied to each dataframe that were the same

and ended with this:

three.to_csv('three_x.csv')
three_label.to_csv('three_y.csv')

admiral.to_csv('admiral_x.csv')
admiral_label.to_csv('admiral_y.csv')

aviva.to_csv('aviva_x.csv')
aviva_label.to_csv('aviva_y.csv')

my question to you guys is this, is there any way you have of speeding up the process so that I do not have to be repeating code like this all the time? thank you guys and have a good day

Upvotes: 2

Views: 181

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210872

i would do it this way:

in_csv_template = 'data/LON_{}.csv'
out_csv_template = 'out/{}_x.csv'
out_label_template = 'out/{}_y.csv'

cfg = {
    'aviva': 'AV_',
    'admiral': 'ADM',
    'three': 'III',
}

def process(fi_csv, fo_csv, fo_label, **kwargs):
    df = pd.read_csv(fi_csv, **kwargs)

    ...

    df.to_csv(fo_csv)
    df_label.to_csv(fo_label)


for k,v in cfg.items():
    process(in_csv_template.format(v),
            out_csv_template.format(k),
            out_label_template.format(k),
            parse_dates=['Date'],
            index_col='Date',
            date_parser=dateparse)

Upvotes: 0

unutbu
unutbu

Reputation: 879919

Instead of three DataFrames, aviva, admiral, three, use one dict with keys of the same names which map to DataFrames:

dfs = dict()
for filename, name in [('LON_AV_.csv', 'aviva'), 
                       ('LON_ADM.csv', 'admiral'), 
                       ('LON_III.csv', 'three')]:
    dfs[name] = pd.read_csv('data/{}'.format(filename), parse_dates=['Date'], 
                            index_col='Date', date_parser=dateparse)

    ...

    dfs[name].to_csv('{}_x.csv'.format(name))
    label[name].to_csv('{}_y.csv'.format(name))

Upvotes: 1

Related Questions