Reputation: 1292
I have a DataFrame looks like below:
df = {'col_1': [1,2,3,4,5,6,7,8,9,10],
'col_2': [1,2,3,4,5,6,7,8,9,10],
'col_3':['A','A','A','A','A','B','B','B','B','B']}
df = pd.DataFrame(df)
while the real data I'm using has hundreds of columns, I want to manipulate these columns using different functions like min
,max
as well as self-defined function like:
def dist(x):
return max(x) - min(x)
def HHI(x):
ss = sum([s**2 for s in x])
return ss
Instead of wirting many lines, I want to have a function like :
def myfunc(cols,fun):
return df.groupby('col_3')[[cols]].transform(lambda x: fun)
# which allow me to do something like:
df[['min_' + s for s in cols]] = myfunc(cols, min)
df[['max_' + s for s in cols]] = myfunc(cols, max)
df[['dist_' + s for s in cols]] = myfunc(cols, dist)
Is this possible in Python(my guess is 'yes')?
Then how if yes?
EDIT ====== ABOUT NAME OF SELF-DEFINED FUNCTION =======
According to jpp
's solution, what I've asked is possible, at least for bulit-in functions, more work need regard self-defined function.
A workable solution,
temp = df.copy()
for func in ['HHI','DIST'] :
print(func)
temp[[ func + s for s in cols]] = df.pipe(myfunc,cols,eval(func))
The key here is to use eval
tunction to convert string expression as a function. However, there may be better way to do this, looking forward to see.
EDIT ====== per jpp's comment about name of self-defined function =======
jpp's comment that feeds function name directly to myfun
is valid based on my test, however, new column name based on func
will be some thing like: <function HHI at 0x00000194460019D8>
, which is not very readable, the modification is temp[[ str(func.__name__) + s for s in cols]]
, hope this will help those who come to this problem later.
Upvotes: 7
Views: 458
Reputation: 1431
yes, you're very close:
def myfunc(cols,fun):
return df.groupby('col_3')[cols].transform(lambda x: fun(x))
Or:
def myfunc(cols,fun):
return df.groupby('col_3')[cols].transform(fun)
Upvotes: 3
Reputation: 164683
Here's one way using pd.DataFrame.pipe
.
With Python everything is an object and can be passed around with no type-checking. The philosophy is "Don't check if it works, just try it...". Hence you can pass either a string or a function to myfunc
and thereon to transform
without any harmful side-effects.
def myfunc(df, cols, fun):
return df.groupby('col_3')[cols].transform(fun)
cols = ['col_1', 'col_2']
df[[f'min_{s}' for s in cols]] = df.pipe(myfunc, cols, 'min')
df[[f'max_{s}' for s in cols]] = df.pipe(myfunc, cols, 'max')
df[[f'dist_{s}' s in cols]] = df.pipe(myfunc, cols, lambda x: x.max() - x.min())
Result:
print(df)
col_1 col_2 col_3 min_col_1 min_col_2 max_col_1 max_col_2 dist_col_1 \
0 1 1 A 1 1 5 5 4
1 2 2 A 1 1 5 5 4
2 3 3 A 1 1 5 5 4
3 4 4 A 1 1 5 5 4
4 5 5 A 1 1 5 5 4
5 6 6 B 6 6 10 10 4
6 7 7 B 6 6 10 10 4
7 8 8 B 6 6 10 10 4
8 9 9 B 6 6 10 10 4
9 10 10 B 6 6 10 10 4
dist_col_2
0 4
1 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 4
Upvotes: 4