shantanuo
shantanuo

Reputation: 32306

Assign a function as a class method

I want to pass only valid parameters to a function ("read_excel") I tried the following code but getting an error...

import pandas as pd
expected_keys=['io', 'sheet_name','header', 'names', 'index_col', 'usecols', 'squeeze', 'dtype', 'engine', 
               'converters', 'true_values',  'false_values', 'skiprows', 'nrows', 'na_values', 'parse_dates',
               'date_parser', 'thousands', 'comment', 'skipfooter', 'convert_float']

def safe_read_excel(self,  *args, **kwargs):
    if set(kwargs.keys()).difference(set(expected_keys)): 
        raise ValueError('invalid parameter found')
    return self.read_excel(f_name, *args, **kwargs)

pd.safe_read_excel = safe_read_excel

When I use the default "read_excel" method a dataframe is created...

df= pd.read_excel('sales_summary.xlsx', header=0)

But my custom method throws an error...

df= pd.safe_read_excel('sales_summary.xlsx', header=0)

AttributeError: 'str' object has no attribute 'read_excel'

How do I assign my function as pandas method?

Upvotes: 0

Views: 429

Answers (3)

Martijn Pieters
Martijn Pieters

Reputation: 1121484

You added a new function to the top-level module of the Pandas library. Function attributes of a module object do not get bound and do not get passed the module object as self (modules do not implement descriptor protocol access. Just remove the self argument, just access the read_excel function on the pd reference to the module.

The self variable was instead bound to the 'sales_summary.xlsx' string, which doesn't have a read_excel attribute.

Note that dict.keys(), in Python 3, is a dictionary view object that can be used as a set directly:

def safe_read_excel(*args, **kwargs):
    if not kwargs.keys() <= expected_keys: 
        raise ValueError('invalid parameter found')
    return pd.read_excel(f_name, *args, **kwargs)

The <= operation is only true if kwargs.keys() is a subset of or equal to the names in expected_keys. This is more efficient than using set.difference() or set_object - set_object, as no new set object needs to be created. I'd make expected_keys a set object rather than a list, here, to aid set operation performance:

expected_keys = {
    'io', 'sheet_name','header', 'names', 'index_col', 'usecols', 'squeeze',
    'dtype', 'engine', 'converters', 'true_values',  'false_values',
    'skiprows', 'nrows', 'na_values', 'parse_dates', 'date_parser',
    'thousands', 'comment', 'skipfooter', 'convert_float'
}

In Python 2, you'd use kwargs.viewkeys() instead, to get the same functionality. For a library that needs to support both Python 2 and 3, you can use six.viewkeys() or create your own local versions of what the six library does.

Note that you never need to bind to a module; you already need to have access to the module to add your new function to the namespace, and modules are singletons. Your function will always deal with just the one module object, not with multiple instances of the Pandas library, so there is no need to complicate your codebase with binding support here. Methods need binding only because you can have any number of instances for a single class, and your method needs to have access to a specific instance from those to have access to the instance attributes.

Upvotes: 4

duhaime
duhaime

Reputation: 27594

You can bind a new method to a class by using the types module, which will allow you to do introspection / refer to self inside the new class method:

import pandas as pd
import types

expected_keys=['io', 'sheet_name','header', 'names', 'index_col', 'usecols', 'squeeze', 'dtype', 'engine',
               'converters', 'true_values',  'false_values', 'skiprows', 'nrows', 'na_values', 'parse_dates',
               'date_parser', 'thousands', 'comment', 'skipfooter', 'convert_float']

def safe_read_excel(self,  *args, **kwargs):
    if set(kwargs.keys()).difference(set(expected_keys)):
      raise ValueError('invalid parameter found')
    return self.read_excel(args[0], *args, **kwargs)

pd.safe_read_excel = types.MethodType(safe_read_excel, pd)

df = pd.safe_read_excel('sales_summary.xlsx', header=0)

Upvotes: 1

FLab
FLab

Reputation: 7466

That's because you are writing your safe_read_excel function as a method of a class, while it is a "normal function" (or static method).

In practical words, you do not need self:

def safe_read_excel(f_name,  *args, **kwargs):
    if set(kwargs.keys()).difference(set(expected_keys)): 
        raise ValueError('invalid parameter found')
    return pd.read_excel(f_name, *args, **kwargs)

I changed the first input of the function from self to f_name and changed the return to pd.read_excel

Upvotes: 3

Related Questions