Zanshin
Zanshin

Reputation: 1272

pass multiple dataframes through a function simultaneously

How to pass df10 and df20 (and even more dataframes) through func simultaneously and keep their names for further use?

import pandas as pd
import numpy as np

df = pd.DataFrame( {
   'A': ['d','d','d','d','d','d','g','g','g','g','g','g','k','k','k','k','k','k'],
   'B': [5,5,6,4,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
   'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013,2012,2013,2014,2015,2016,2014]     
    } );

df10 = (df.B + df.C).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)

df20 = (df['B'] - df['C']).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)

def func(df):
    df1 = df.groupby(level=0, axis=1).sum()
    new_cols= list(zip(df1.columns.get_level_values(0),['total'] * len(df.columns)))
    df1.columns = pd.MultiIndex.from_tuples(new_cols)
    df2 = pd.concat([df1,df], axis=1).sort_index(axis=1).sort_index(axis=1, level=1)
    df2.columns = ['_'.join((col[0], str(col[1]))) for col in df2.columns]
    df2.columns = df2.columns.str.replace('sum_','')
    df2.columns = df2.columns.str.replace('size_','T')
    return df2

EDIT, per request the dataframes printed;

print(df10) print(df20)

df10:

    sum size
S   2012    2013    2014    2015    2016    2012    2013    2014    2015    2016
A                                       
d   13  6   7   5   6   2   1   1   1   1
g   -11 8   8   8   7   2   1   1   1   1
k   -6  9   48  8   -5  1   1   2   1   1



 df20:

    sum size
S   2012    2013    2014    2015    2016    2012    2013    2014    2015    2016
A                                       
d   9   4   5   3   4   2   1   1   1   1
g   -15 6   6   6   5   2   1   1   1   1
k   -10 5   40  4   -9  1   1   2   1   1

print outs added

Upvotes: 3

Views: 5268

Answers (1)

Chuck
Chuck

Reputation: 3852

Edit: There is probably a much better way to do this; I just thought I would offer this suggestion. If it is not as required, please let me know, and I will delete.

How to pass df10 and df20 (and even more dataframes) through func simultaneously and keep their names for further use?

If all you wanted to do is pass multiple functions through func and all your data frames are the same format, something as follows may work.

For simplicity take the dataframes:

df10 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
df20 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
df30 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})

and a simple function:

your_func(df):
    #### Perform some action/change to df eg
    df2 = df.head(1)
    return df2

Create a list of your original dataframes:

A = [df10,df20,df30]

A = [   one  two
    0  1.0  4.0
    1  2.0  3.0
    2  3.0  2.0
    3  4.0  1.0,    
        one  two
    0  1.0  4.0
    1  2.0  3.0
    2  3.0  2.0
    3  4.0  1.0,    
        one  two
    0  1.0  4.0
    1  2.0  3.0
    2  3.0  2.0
    3  4.0  1.0]

Then, use a for loop to pass each data-frame through a list e.g. This will keep your original dataframes unchanged.

for i in range(0,len(A)):
    A[i] = your_func(A[i])

Output:

A = [
 one  two
0  1.0  4.0,
 one  two
0  1.0  4.0,
 one  two
0  1.0  4.0]

So, now the list A contains each of the new dataframes. And your original dataframes df10 df20 etc remain unchanged. Merely call the elements of A to access your new dataframes.

Upvotes: 7

Related Questions