Reputation: 32306
I want to pass only valid parameters to a function ("read_excel") I tried the following code but getting an error...
import pandas as pd
expected_keys=['io', 'sheet_name','header', 'names', 'index_col', 'usecols', 'squeeze', 'dtype', 'engine',
'converters', 'true_values', 'false_values', 'skiprows', 'nrows', 'na_values', 'parse_dates',
'date_parser', 'thousands', 'comment', 'skipfooter', 'convert_float']
def safe_read_excel(self, *args, **kwargs):
if set(kwargs.keys()).difference(set(expected_keys)):
raise ValueError('invalid parameter found')
return self.read_excel(f_name, *args, **kwargs)
pd.safe_read_excel = safe_read_excel
When I use the default "read_excel" method a dataframe is created...
df= pd.read_excel('sales_summary.xlsx', header=0)
But my custom method throws an error...
df= pd.safe_read_excel('sales_summary.xlsx', header=0)
AttributeError: 'str' object has no attribute 'read_excel'
How do I assign my function as pandas method?
Upvotes: 0
Views: 429
Reputation: 1121484
You added a new function to the top-level module of the Pandas library. Function attributes of a module object do not get bound and do not get passed the module object as self
(modules do not implement descriptor protocol access. Just remove the self
argument, just access the read_excel
function on the pd
reference to the module.
The self
variable was instead bound to the 'sales_summary.xlsx'
string, which doesn't have a read_excel
attribute.
Note that dict.keys()
, in Python 3, is a dictionary view object that can be used as a set directly:
def safe_read_excel(*args, **kwargs):
if not kwargs.keys() <= expected_keys:
raise ValueError('invalid parameter found')
return pd.read_excel(f_name, *args, **kwargs)
The <=
operation is only true if kwargs.keys()
is a subset of or equal to the names in expected_keys
. This is more efficient than using set.difference()
or set_object - set_object
, as no new set object needs to be created. I'd make expected_keys
a set object rather than a list, here, to aid set operation performance:
expected_keys = {
'io', 'sheet_name','header', 'names', 'index_col', 'usecols', 'squeeze',
'dtype', 'engine', 'converters', 'true_values', 'false_values',
'skiprows', 'nrows', 'na_values', 'parse_dates', 'date_parser',
'thousands', 'comment', 'skipfooter', 'convert_float'
}
In Python 2, you'd use kwargs.viewkeys()
instead, to get the same functionality. For a library that needs to support both Python 2 and 3, you can use six.viewkeys()
or create your own local versions of what the six
library does.
Note that you never need to bind to a module; you already need to have access to the module to add your new function to the namespace, and modules are singletons. Your function will always deal with just the one module object, not with multiple instances of the Pandas library, so there is no need to complicate your codebase with binding support here. Methods need binding only because you can have any number of instances for a single class, and your method needs to have access to a specific instance from those to have access to the instance attributes.
Upvotes: 4
Reputation: 27594
You can bind a new method to a class by using the types
module, which will allow you to do introspection / refer to self
inside the new class method:
import pandas as pd
import types
expected_keys=['io', 'sheet_name','header', 'names', 'index_col', 'usecols', 'squeeze', 'dtype', 'engine',
'converters', 'true_values', 'false_values', 'skiprows', 'nrows', 'na_values', 'parse_dates',
'date_parser', 'thousands', 'comment', 'skipfooter', 'convert_float']
def safe_read_excel(self, *args, **kwargs):
if set(kwargs.keys()).difference(set(expected_keys)):
raise ValueError('invalid parameter found')
return self.read_excel(args[0], *args, **kwargs)
pd.safe_read_excel = types.MethodType(safe_read_excel, pd)
df = pd.safe_read_excel('sales_summary.xlsx', header=0)
Upvotes: 1
Reputation: 7466
That's because you are writing your safe_read_excel
function as a method of a class, while it is a "normal function" (or static method).
In practical words, you do not need self
:
def safe_read_excel(f_name, *args, **kwargs):
if set(kwargs.keys()).difference(set(expected_keys)):
raise ValueError('invalid parameter found')
return pd.read_excel(f_name, *args, **kwargs)
I changed the first input of the function from self
to f_name
and changed the return to pd.read_excel
Upvotes: 3