Reputation: 143
this is my first question here so please be patient with me.
My problem is as follows:
Assume we have a pandas Dataframe and we want to apply dynamically some pd.Series methods to a set of columns of this Dataframe. Why the following example doesn't work?
testframe=pd.DataFrame.from_dict({'col1': [1,2] ,'col2': [3,4] })
funcdict={'col1':[pd.Series.astype,str.replace],'col2':[pd.Series.astype,str.replace]}
argdict= {'col1':[['str'],['1','A']],'col2':[['str'],['3','B']]}
for col in testframe.columns:
for func in funcdict[col]:
idx=funcdict[col].index(func)
testframe[col]=testframe[col].func(*argdict[col][idx])
Expected outcome would be
col1 col2
0 'A' 'B'
1 '1' '4'
But instead i get
AttributeError: 'Series' object has no attribute 'func'
Remarkably
testframe['col1']=testframe['col1'].astype(*argdict['col1'][0])
works as expected, so somehow python seems to have a problem with the '.func' syntax despite the fact that
print(func)
yields the desired output: 'function NDFrame.astype at 0x00000186954EB840' etc.
Upvotes: 4
Views: 674
Reputation: 879631
You could use rgettattr
to get attributes from the Series, testframe[col]
:
For example,
In [74]: s = pd.Series(['1','2'])
In [75]: rgetattr(s, 'str.replace')('1', 'A')
Out[75]:
0 A
1 2
dtype: object
import functools
import pandas as pd
def rgetattr(obj, attr, *args):
def _getattr(obj, attr):
return getattr(obj, attr, *args)
return functools.reduce(_getattr, [obj] + attr.split('.'))
testframe = pd.DataFrame.from_dict({'col1': [1, 2], 'col2': [3, 4]})
funcdict = {'col1': ['astype', 'str.replace'],
'col2': ['astype', 'str.replace']}
argdict = {'col1': [['str'], ['1', 'A']], 'col2': [['str'], ['3', 'B']]}
for col in testframe.columns:
for attr, args in zip(funcdict[col], argdict[col]):
testframe[col] = rgetattr(testframe[col], attr)(*args)
print(testframe)
yields
col1 col2
0 A B
1 2 4
getattr
is the function in Python's standard library used for getting a named attribute from an object when the name is given in the form of a string. For example, given
In [92]: s = pd.Series(['1','2']); s
Out[92]:
0 1
1 2
dtype: object
we can obtain s.str
using
In [85]: getattr(s, 'str')
Out[85]: <pandas.core.strings.StringMethods at 0x7f334a847208>
In [91]: s.str == getattr(s, 'str')
Out[91]: True
To obtain s.str.replace
, we would need
In [88]: getattr(getattr(s, 'str'), 'replace')
Out[88]: <bound method StringMethods.replace of <pandas.core.strings.StringMethods object at 0x7f334a847208>>
In [90]: s.str.replace == getattr(getattr(s, 'str'), 'replace')
Out[90]: True
However, if we specify
funcdict = {'col1': ['astype', 'str.replace'],
'col2': ['astype', 'str.replace']}
then we need some way of handling cases where we need one call to getattr
, (e.g. getattr(testframe[col], 'astype')
) versus those cases where we need multiple calls to getattr
(e.g. getattr(getattr(testframe[col], 'str'), 'replace')
.
To unify the two cases into one simple syntax, we can use rgetattr
, a recursive drop-in replacement for getattr
which can handle dotted chains of string attribute names such as 'str.replace'
.
The recursion is handled by reduce
.
The docs give as an example that reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])
calculates ((((1+2)+3)+4)+5)
. Similarly, you can imagine the +
being replaced by getattr
so that rgetattr(s, 'str.replace')
calculates getattr(getattr(s, 'str'), 'replace')
.
Upvotes: 2
Reputation: 164683
Your syntax for calling a method is incorrect. There are 2 ways you can call a method in Python.
Direct
As you found, this will work. Note that astype
isn't referencing some other object, it's the actual name of the method belonging to pd.Series
.
testframe['col1'] = testframe['col1'].astype(*argdict['col1'][0])
Functional
The functional method demonstrates explicitly that astype
is the name of the method.
from operator import methodcaller
testframe['col1'] = methodcaller('astype', *argdict['col1'][0])(testframe[col])
Trying testframe[col].func(...)
will never work as func
is not the name of a pd.Series
method.
Upvotes: 4