Apply the same operation to multiple DataFrames efficiently

Question

I have two data frames with the same columns, and similar content.

I'd like apply the same functions on each, without having to brute force them, or concatenate the dfs. I tried to pass the objects into nested dictionaries, but that seems more trouble than it's worth (I don't believe dataframe.to_dict supports passing into an existing list).

However, it appears that the for loop stores the list of dfs in the df object, and I don't know how to get it back to the original dfs... see my example below.

df1 = {'Column1': [1,2,2,4,5],
        'Column2': ["A","B","B","D","E"]}
df1 = pd.DataFrame(df1, columns=['Column1','Column2'])

df2 = {'Column1': [2,11,2,2,14],
         'Column2': ["B","Y","B","B","V"]}
df2 = pd.DataFrame(df2, columns=['Column1','Column2'])


def filter_fun(df1, df2):
    for df in (df1, df2):
        df = df[(df['Column1']==2) & (df['Column2'].isin(['B']))]
    return df1, df2

filter_fun(df1, df2)

Andy Hayden · Accepted Answer

If you write the filter as a function you can apply it in a list comprehension:

def filter(df):
    return df[(df['Column1']==2) & (df['Column2'].isin(['B']))]


df1, df2 = [filter(df) for df in (df1, df2)]

Apply the same operation to multiple DataFrames efficiently

Answers (2)

Related Questions