June
June

Reputation: 730

Apply the same operation to multiple DataFrames efficiently

I have two data frames with the same columns, and similar content.

I'd like apply the same functions on each, without having to brute force them, or concatenate the dfs. I tried to pass the objects into nested dictionaries, but that seems more trouble than it's worth (I don't believe dataframe.to_dict supports passing into an existing list).

However, it appears that the for loop stores the list of dfs in the df object, and I don't know how to get it back to the original dfs... see my example below.

df1 = {'Column1': [1,2,2,4,5],
        'Column2': ["A","B","B","D","E"]}
df1 = pd.DataFrame(df1, columns=['Column1','Column2'])

df2 = {'Column1': [2,11,2,2,14],
         'Column2': ["B","Y","B","B","V"]}
df2 = pd.DataFrame(df2, columns=['Column1','Column2'])


def filter_fun(df1, df2):
    for df in (df1, df2):
        df = df[(df['Column1']==2) & (df['Column2'].isin(['B']))]
    return df1, df2

filter_fun(df1, df2)

Upvotes: 0

Views: 2523

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375475

If you write the filter as a function you can apply it in a list comprehension:

def filter(df):
    return df[(df['Column1']==2) & (df['Column2'].isin(['B']))]


df1, df2 = [filter(df) for df in (df1, df2)]

Upvotes: 4

cs95
cs95

Reputation: 402433

I would recommend concatenation with custom specified keys, because 1) it is easy to assign it back, and 2) you can do the same operation once instead of twice.

# Concatenate df1 and df2
df = pd.concat([df1, df2], keys=['a', 'b'])
# Perform your operation
out = df[(df['Column1'] == 2) & df['Column2'].isin(['B'])]

out.loc['a'] # result for `df1`

   Column1 Column2
1        2       B
2        2       B

out.loc['b'] # result for `df2`

   Column1 Column2
0        2       B
2        2       B
3        2       B

This should work fine for most operations. For groupby, you will want to group on the 0th index level as well.

Upvotes: 2

Related Questions