Reputation: 1453
I need to have a new type, say MyTuple
, so that I can create objects like this: obj = MyTuple((1,2,3))
, such as:
obj
behaves exactly as a native tuple (also in performance) isinstance(obj, tuple)
returns False
.The reason behind this is that I need to use tuples as indexes in Pandas, but when Pandas detects that the values of the index are tuples, it uses multiindexes instead, which I don't want.
Thus, the following does not work:
class MyTuple(tuple):
pass
This fulfills my first requirement but not the second one, so if I use MyTuple
objects as indexes, Pandas still creates multiindexes from them.
Another solution is to use composition instead of inheritance, implementing the Sequence
abc and having the true tuple as an object attribute, providing wrapper methods around it:
from collections.abc import Sequence
class MyTuple(Sequence):
def __init__(self, initlist=None):
self.data = () # A true tuple is stored in the object
if initlist is not None:
if type(initlist) == type(self.data): self.data = initlist
elif isinstance(initlist, MyTuple): self.data = initlist.data
else: self.data = tuple(initlist)
def __getitem__(self, i): return self.data[i]
def __len__(self): return len(self.data)
def __hash__(self): return hash(self.data)
def __repr__(self): return repr(self.data)
def __eq__(self, other): return self.data == other
def __iter__(self): yield from self.data.__iter__()
This type fulfills the second requirement (isinstance(obj, tuple)
returns False
), and provides the same interface than a true tuple
(you can access the elements via indexes, you can compare it with another tuples, you can use it as dictionary keys, etc). Syntactically and semantically this solution is good for me.
However it is not a true tuple
in terms of performance. In my application I have to perform tons of comparisons betweens these objects (and of these objects with true tuple
s), so the method MyTuple.__eq__()
is called tons of times. This introduces a performance penalty. Using MyTuple
instead of true tuples, my program multiplies by six the runtime.
Then, what I need is something like my first attempt (a class which inherits from tuple
), but which later can "lie" about being a tuple, if asked via isinstance()
(because this is how Pandas finds out if it is a tuple and thus should create a multiindex).
I read about Python's datamodel and __instancecheck__()
methods, but I think that they are not useful here, because I should implement those methods in tuple
, instead of MyTuple
, but this is not possible.
Perhaps some tricks with metaclasses would do it, but I do not fully understand the concept to see its relationship with this problem.
Can I achieve my goals somehow?
Upvotes: 0
Views: 385
Reputation: 11950
class MyTuple(object):
def __init__(self, iterable):
self.data = tuple(iterable)
def __getitem__(self, i):
return tuple.__getitem__(self.data, i)
t = MyTuple((1, 2, 3))
print(t[1])
print(isinstance(t, tuple))
Other methods analogously.
Still not a true tuple
performancewise, but the closest I can think of... probably.
Upvotes: 1
Reputation: 77910
The main problem you'll hit with performance is that Python's underlying collateral is implemented in highly-optimized C (or other language -- there are a dozen or so good implementations, and more coming).
When you implement something in Python, remember that it's interpreted, or partially-compiled at best. You cannot get optimum performance when each line has to be reinterpreted at run time, even given good intermediate code.
Upvotes: 0