deonardo_licaprio
deonardo_licaprio

Reputation: 318

How to write generic function that acts differently depending on passed pandas column dtype

I'm trying to write a generic function using singledispatch from functools. I want the function to behave differently depending on the type of passed argument - in this case, it will be a column of data frame, which can be of different dtype: int64, float64, object, bool etc.

I tried to do something basic experimentally:

@singledispatch
def sprint(data):
    print('default success')

@sprint.register('float64')
def _(data):
    print('float success')

@sprint.register('int64')
def _(data):
    print('int success')

# test
from sklearn.datasets import load_iris
data_i = load_iris()
df_iris = pd.DataFrame(data_i.data, columns=data_i.feature_names)

sprint(df_iris['sepal length (cm)'])

But obviously I get an error because python doesn't look at dtype property of a column.

Is there any way to work around it ?

I'd be grateful for help.

Upvotes: 2

Views: 169

Answers (1)

Fernando Wittmann
Fernando Wittmann

Reputation: 2547

The main problem is that you are declaring the dtype as a string, so the correct would be something like:

@singledispatch
def sprint(data):
    print('default success')

@sprint.register(float)
def _(data):
    print('float success')

@sprint.register(int)
def _(data):
    print('int success')

sprint(5.6)
# >> float success

But that still will output 'default' in your code since the actual type of your input is pandas.core.series.Series. In this case, I would use criterias based on the dtype of the column, for example:

def sprint(data):
    if data.dtype == np.float64:
        print('float success')
    elif data.dtype == np.int64:
        print('int success')
    else:
        print('default success')

Finally if what you are trying to do is to process a dataframe based on its dtype, it might be easier to use sklearn's ColumnTransformer.

Upvotes: 1

Related Questions