Jia Gao
Jia Gao

Reputation: 1292

define a function use other function names as parameter

I have a DataFrame looks like below:

df = {'col_1': [1,2,3,4,5,6,7,8,9,10],
      'col_2': [1,2,3,4,5,6,7,8,9,10],
      'col_3':['A','A','A','A','A','B','B','B','B','B']}
df = pd.DataFrame(df)

while the real data I'm using has hundreds of columns, I want to manipulate these columns using different functions like min,max as well as self-defined function like:

def dist(x):
    return max(x) - min(x)
def HHI(x):
    ss = sum([s**2 for s in x])
    return ss

Instead of wirting many lines, I want to have a function like :

def myfunc(cols,fun):
    return df.groupby('col_3')[[cols]].transform(lambda x: fun)
# which allow me to do something like:

df[['min_' + s for s in cols]] = myfunc(cols, min)
df[['max_' + s for s in cols]] = myfunc(cols, max)
df[['dist_' + s for s in cols]] = myfunc(cols, dist)

Is this possible in Python(my guess is 'yes')?
Then how if yes?

EDIT ====== ABOUT NAME OF SELF-DEFINED FUNCTION =======
According to jpp's solution, what I've asked is possible, at least for bulit-in functions, more work need regard self-defined function.

A workable solution,

temp = df.copy()
for func in ['HHI','DIST'] :
    print(func)
    temp[[ func + s for s in cols]] = df.pipe(myfunc,cols,eval(func))

The key here is to use eval tunction to convert string expression as a function. However, there may be better way to do this, looking forward to see.

EDIT ====== per jpp's comment about name of self-defined function =======

jpp's comment that feeds function name directly to myfun is valid based on my test, however, new column name based on func will be some thing like: <function HHI at 0x00000194460019D8>, which is not very readable, the modification is temp[[ str(func.__name__) + s for s in cols]], hope this will help those who come to this problem later.

Upvotes: 7

Views: 458

Answers (2)

gyx-hh
gyx-hh

Reputation: 1431

yes, you're very close:

def myfunc(cols,fun):
    return df.groupby('col_3')[cols].transform(lambda x: fun(x))

Or:

def myfunc(cols,fun):
    return df.groupby('col_3')[cols].transform(fun)

Upvotes: 3

jpp
jpp

Reputation: 164683

Here's one way using pd.DataFrame.pipe.

With Python everything is an object and can be passed around with no type-checking. The philosophy is "Don't check if it works, just try it...". Hence you can pass either a string or a function to myfunc and thereon to transform without any harmful side-effects.

def myfunc(df, cols, fun):
    return df.groupby('col_3')[cols].transform(fun)

cols = ['col_1', 'col_2']

df[[f'min_{s}' for s in cols]] = df.pipe(myfunc, cols, 'min')
df[[f'max_{s}' for s in cols]] = df.pipe(myfunc, cols, 'max')
df[[f'dist_{s}' s in cols]] = df.pipe(myfunc, cols, lambda x: x.max() - x.min())

Result:

print(df)

   col_1  col_2 col_3  min_col_1  min_col_2  max_col_1  max_col_2  dist_col_1  \
0      1      1     A          1          1          5          5           4   
1      2      2     A          1          1          5          5           4   
2      3      3     A          1          1          5          5           4   
3      4      4     A          1          1          5          5           4   
4      5      5     A          1          1          5          5           4   
5      6      6     B          6          6         10         10           4   
6      7      7     B          6          6         10         10           4   
7      8      8     B          6          6         10         10           4   
8      9      9     B          6          6         10         10           4   
9     10     10     B          6          6         10         10           4   

   dist_col_2  
0           4  
1           4  
2           4  
3           4  
4           4  
5           4  
6           4  
7           4  
8           4  
9           4  

Upvotes: 4

Related Questions