Anna T
Anna T

Reputation: 1

list of DataFrames as an argument of function/loop

I have multiple DataFrame and I need to perform various operations on them. I want to put them in one list to avoid listing them all the time as in the example bellow:

for df in (df1, df2,df3,df4,df5,df6,df7):
df.columns=['COUNTRY','2018','2019']
df.replace({':':''}, regex=True, inplace=True)
df.replace({' ':''}, regex=True, inplace=True)
df["2018"] = pd.to_numeric(df["2018"], downcast="float")
df["2019"] = pd.to_numeric(df["2019"], downcast="float")

I tried to make a list of them (DataFrames=[df1,df2,df3,df4,df5,df6,df7]) but it's working neither in the loop or as an argument of a function.

for df in (DataFrame):
df.columns=['COUNTRY','2018','2019']
df.replace({':':''}, regex=True, inplace=True)
df.replace({' ':''}, regex=True, inplace=True)
df["2018"] = pd.to_numeric(df["2018"], downcast="float")
df["2019"] = pd.to_numeric(df["2019"], downcast="float")

Upvotes: 0

Views: 699

Answers (2)

cyau
cyau

Reputation: 449

Using nunvie's answer as a base, here is another option for you:

import pandas as pd

data = {
    'COUNTRY': ['country1', 'country2', 'country3'],
    '2018': ['12.0', '27', '35'],
    '2019': ['2:3', '3:9.6', '4:0.3'],
    '2020': ['35', '42', '56']
}


df_list = [pd.DataFrame(data) for i in range(5)]


def data_prep(df: pd.DataFrame):
    df = df.loc[:, ['COUNTRY', '2018', '2019']]

    df.replace({':': ''}, regex=True, inplace=True)
    df.replace({' ': ''}, regex=True, inplace=True)

    df['2018'] = pd.to_numeric(df['2018'], downcast="float")
    df['2019'] = pd.to_numeric(df['2019'], downcast="float")

    return df


new_df_list = map(data_prep, df_list)

The improvements (in my opinion) are as follows. First, it is more concise to use list comprehension for the test setup (that's not directly related to the answer). Second, pd.to_numeric doesn't have inplace (at least in pandas 1.2.3). It returns the series you passed if the parsing succeeded. Thus, you need to explicitly say df['my_col'] = pd.to_numeric(df['my_col']).

And third, I've used map to apply the data_prep function to each DataFrame in the list. This makes data_prep responsible for only one data frame and also saves you from writing loops. The benefit is leaner and more readable code, if you like the functional flavour of it, of course.

Upvotes: 0

nunvie
nunvie

Reputation: 31

you can place the dataframes on a list and add the columns like this:

import pandas as pd
from pandas import DataFrame


data = {'COUNTRY': ['country1', 'country2', 'country3'],
    '2018': [12.0, 27, 35],
    '2019': [23, 39.6, 40.3],
    '2020': [35, 42, 56]}

df_list = [DataFrame(data), DataFrame(data), DataFrame(data),
           DataFrame(data), DataFrame(data), DataFrame(data), 
           DataFrame(data)]


def change_dataframes(data_frames=None):

    for df in data_frames:

        df = df.loc[:, ['COUNTRY', '2018', '2019']]

        df.replace({':': ''}, regex=True, inplace=True)
        df.replace({' ': ''}, regex=True, inplace=True)

        pd.to_numeric(df['2018'], downcast="float")
        pd.to_numeric(df['2019'], downcast="float")

    return data_frames

Upvotes: 1

Related Questions