tired
tired

Reputation: 143

Apply a method from a list of methods to pandas dataframe

this is my first question here so please be patient with me.

My problem is as follows:

Assume we have a pandas Dataframe and we want to apply dynamically some pd.Series methods to a set of columns of this Dataframe. Why the following example doesn't work?

testframe=pd.DataFrame.from_dict({'col1': [1,2] ,'col2': [3,4] })
funcdict={'col1':[pd.Series.astype,str.replace],'col2':[pd.Series.astype,str.replace]}
argdict= {'col1':[['str'],['1','A']],'col2':[['str'],['3','B']]}

for col in testframe.columns:
    for func in funcdict[col]:
            idx=funcdict[col].index(func)
            testframe[col]=testframe[col].func(*argdict[col][idx])

Expected outcome would be

  col1 col2
0  'A'  'B'
1  '1'  '4'

But instead i get

AttributeError: 'Series' object has no attribute 'func'

Remarkably

testframe['col1']=testframe['col1'].astype(*argdict['col1'][0])

works as expected, so somehow python seems to have a problem with the '.func' syntax despite the fact that

print(func)

yields the desired output: 'function NDFrame.astype at 0x00000186954EB840' etc.

Upvotes: 4

Views: 674

Answers (2)

unutbu
unutbu

Reputation: 879631

You could use rgettattr to get attributes from the Series, testframe[col]: For example,

In [74]: s = pd.Series(['1','2'])

In [75]: rgetattr(s, 'str.replace')('1', 'A')
Out[75]: 
0    A
1    2
dtype: object

import functools
import pandas as pd

def rgetattr(obj, attr, *args):
    def _getattr(obj, attr):
        return getattr(obj, attr, *args)
    return functools.reduce(_getattr, [obj] + attr.split('.'))

testframe = pd.DataFrame.from_dict({'col1': [1, 2], 'col2': [3, 4]})

funcdict = {'col1': ['astype', 'str.replace'],
            'col2': ['astype', 'str.replace']}

argdict = {'col1': [['str'], ['1', 'A']], 'col2': [['str'], ['3', 'B']]}

for col in testframe.columns:
    for attr, args in zip(funcdict[col], argdict[col]):
        testframe[col] = rgetattr(testframe[col], attr)(*args)
print(testframe)

yields

  col1 col2
0    A    B
1    2    4

getattr is the function in Python's standard library used for getting a named attribute from an object when the name is given in the form of a string. For example, given

In [92]: s = pd.Series(['1','2']); s
Out[92]: 
0    1
1    2
dtype: object

we can obtain s.str using

In [85]: getattr(s, 'str')
Out[85]: <pandas.core.strings.StringMethods at 0x7f334a847208>
In [91]: s.str == getattr(s, 'str')
Out[91]: True

To obtain s.str.replace, we would need

In [88]: getattr(getattr(s, 'str'), 'replace')
Out[88]: <bound method StringMethods.replace of <pandas.core.strings.StringMethods object at 0x7f334a847208>>

In [90]: s.str.replace == getattr(getattr(s, 'str'), 'replace')
Out[90]: True

However, if we specify

funcdict = {'col1': ['astype', 'str.replace'],
            'col2': ['astype', 'str.replace']}

then we need some way of handling cases where we need one call to getattr, (e.g. getattr(testframe[col], 'astype')) versus those cases where we need multiple calls to getattr (e.g. getattr(getattr(testframe[col], 'str'), 'replace').

To unify the two cases into one simple syntax, we can use rgetattr, a recursive drop-in replacement for getattr which can handle dotted chains of string attribute names such as 'str.replace'.

The recursion is handled by reduce. The docs give as an example that reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). Similarly, you can imagine the + being replaced by getattr so that rgetattr(s, 'str.replace') calculates getattr(getattr(s, 'str'), 'replace').

Upvotes: 2

jpp
jpp

Reputation: 164683

Your syntax for calling a method is incorrect. There are 2 ways you can call a method in Python.

Direct

As you found, this will work. Note that astype isn't referencing some other object, it's the actual name of the method belonging to pd.Series.

testframe['col1'] = testframe['col1'].astype(*argdict['col1'][0])

Functional

The functional method demonstrates explicitly that astype is the name of the method.

from operator import methodcaller

testframe['col1'] = methodcaller('astype', *argdict['col1'][0])(testframe[col])

Trying testframe[col].func(...) will never work as func is not the name of a pd.Series method.

Upvotes: 4

Related Questions