Reputation: 1272
How to pass df10 and df20 (and even more dataframes) through func simultaneously and keep their names for further use?
import pandas as pd
import numpy as np
df = pd.DataFrame( {
'A': ['d','d','d','d','d','d','g','g','g','g','g','g','k','k','k','k','k','k'],
'B': [5,5,6,4,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013,2012,2013,2014,2015,2016,2014]
} );
df10 = (df.B + df.C).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)
df20 = (df['B'] - df['C']).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)
def func(df):
df1 = df.groupby(level=0, axis=1).sum()
new_cols= list(zip(df1.columns.get_level_values(0),['total'] * len(df.columns)))
df1.columns = pd.MultiIndex.from_tuples(new_cols)
df2 = pd.concat([df1,df], axis=1).sort_index(axis=1).sort_index(axis=1, level=1)
df2.columns = ['_'.join((col[0], str(col[1]))) for col in df2.columns]
df2.columns = df2.columns.str.replace('sum_','')
df2.columns = df2.columns.str.replace('size_','T')
return df2
EDIT, per request the dataframes printed;
print(df10) print(df20)
df10:
sum size
S 2012 2013 2014 2015 2016 2012 2013 2014 2015 2016
A
d 13 6 7 5 6 2 1 1 1 1
g -11 8 8 8 7 2 1 1 1 1
k -6 9 48 8 -5 1 1 2 1 1
df20:
sum size
S 2012 2013 2014 2015 2016 2012 2013 2014 2015 2016
A
d 9 4 5 3 4 2 1 1 1 1
g -15 6 6 6 5 2 1 1 1 1
k -10 5 40 4 -9 1 1 2 1 1
print outs added
Upvotes: 3
Views: 5268
Reputation: 3852
Edit: There is probably a much better way to do this; I just thought I would offer this suggestion. If it is not as required, please let me know, and I will delete.
How to pass df10 and df20 (and even more dataframes) through func simultaneously and keep their names for further use?
If all you wanted to do is pass multiple functions through func
and all your data frames are the same format, something as follows may work.
For simplicity take the dataframes:
df10 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
df20 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
df30 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
and a simple function:
your_func(df):
#### Perform some action/change to df eg
df2 = df.head(1)
return df2
Create a list of your original dataframes:
A = [df10,df20,df30]
A = [ one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0,
one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0,
one two
0 1.0 4.0
1 2.0 3.0
2 3.0 2.0
3 4.0 1.0]
Then, use a for loop to pass each data-frame through a list e.g. This will keep your original dataframes unchanged.
for i in range(0,len(A)):
A[i] = your_func(A[i])
Output:
A = [
one two
0 1.0 4.0,
one two
0 1.0 4.0,
one two
0 1.0 4.0]
So, now the list A
contains each of the new dataframes. And your original dataframes df10
df20
etc remain unchanged. Merely call the elements of A
to access your new dataframes.
Upvotes: 7