Reputation: 1243
I'm building a database type object which, when an index is not found, uses an api to retrieve the information, save it to the object/file and return it.
I'd like to do this by overloading the .loc[x, y]
method of the pandas DataFrame but I can't work out how to do this!
At the moment I have:
import pandas as pd
pd.set_option('io.hdf.default_format','table')
class DataBase(pd.DataFrame):
"""DataBase Object which can be updated by external api"""
def __init__(self, path, api=None):
super(DataBase, self).__init__(pd.read_hdf('store.h5','df'))
self.api = api
I may want to change the __init__
function to include a where
argument so I can read only what I need to.
I can't think of a way to overload the .loc
method properly!
Also, hdf5 is just one method. I'd like to retain the ability to use any other storage methods like sql, or even csv if necessary
Upvotes: 2
Views: 2248
Reputation: 33
Adding years later to the answer above, if you ever end up overloading the basic pandas classes, you can override some constructor properties to ensure that the new class is sustained through the standard pandas manipulations, from the Internals:
e.g.:
@property
def _constructor(self):
return MyDataFrame
Upvotes: 1
Reputation: 40713
loc
is a property that creates returns a name called _loc
if its not None
else it creates a pandas.core.indexing._LocIndexer
on demand. Indexers, by default have access to the DataFrame that created them, so you can modify the DataFrame on a key miss.
You can override the behaviour of DataFrame.loc
by subclassing DataFrame
and _LocIndexer
as thus.
class MyLocIndexer(_LocIndexer):
def __getitem__(self, key):
try:
return super().__getitem__(key)
except KeyError:
item = db.fetch_item(key)
self[key] = item
return item
# `return self[key]' is better as it also works when accessing a
# whole axis
class MyDataFrame(DataFrame):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._loc = MyLocIndexer(self, "loc")
The above is written in python3, so you will have to fix the super statements if you are using python2.
Upvotes: 4