JLDiaz
JLDiaz

Reputation: 1453

Faking a tuple class in python

I need to have a new type, say MyTuple, so that I can create objects like this: obj = MyTuple((1,2,3)), such as:

  1. obj behaves exactly as a native tuple (also in performance)
  2. isinstance(obj, tuple) returns False.

The reason behind this is that I need to use tuples as indexes in Pandas, but when Pandas detects that the values of the index are tuples, it uses multiindexes instead, which I don't want.

Thus, the following does not work:

class MyTuple(tuple):
    pass

This fulfills my first requirement but not the second one, so if I use MyTuple objects as indexes, Pandas still creates multiindexes from them.

Another solution is to use composition instead of inheritance, implementing the Sequence abc and having the true tuple as an object attribute, providing wrapper methods around it:

from collections.abc import Sequence
class MyTuple(Sequence):
    def __init__(self, initlist=None):
        self.data = ()   # A true tuple is stored in the object
        if initlist is not None:
            if type(initlist) == type(self.data): self.data = initlist
            elif isinstance(initlist, MyTuple): self.data = initlist.data
            else:  self.data = tuple(initlist)
    def __getitem__(self, i): return self.data[i]
    def __len__(self): return len(self.data)
    def __hash__(self): return hash(self.data)
    def __repr__(self): return repr(self.data)
    def __eq__(self, other): return self.data == other
    def __iter__(self): yield from self.data.__iter__()

This type fulfills the second requirement (isinstance(obj, tuple) returns False), and provides the same interface than a true tuple (you can access the elements via indexes, you can compare it with another tuples, you can use it as dictionary keys, etc). Syntactically and semantically this solution is good for me.

However it is not a true tuple in terms of performance. In my application I have to perform tons of comparisons betweens these objects (and of these objects with true tuples), so the method MyTuple.__eq__() is called tons of times. This introduces a performance penalty. Using MyTuple instead of true tuples, my program multiplies by six the runtime.

Then, what I need is something like my first attempt (a class which inherits from tuple), but which later can "lie" about being a tuple, if asked via isinstance() (because this is how Pandas finds out if it is a tuple and thus should create a multiindex).

I read about Python's datamodel and __instancecheck__() methods, but I think that they are not useful here, because I should implement those methods in tuple, instead of MyTuple, but this is not possible.

Perhaps some tricks with metaclasses would do it, but I do not fully understand the concept to see its relationship with this problem.

Can I achieve my goals somehow?

Upvotes: 0

Views: 385

Answers (2)

bipll
bipll

Reputation: 11950

class MyTuple(object):
    def __init__(self, iterable):
        self.data = tuple(iterable)
    def __getitem__(self, i):
        return tuple.__getitem__(self.data, i)

t = MyTuple((1, 2, 3))
print(t[1])
print(isinstance(t, tuple))

Other methods analogously.

Still not a true tuple performancewise, but the closest I can think of... probably.

Upvotes: 1

Prune
Prune

Reputation: 77910

The main problem you'll hit with performance is that Python's underlying collateral is implemented in highly-optimized C (or other language -- there are a dozen or so good implementations, and more coming).

When you implement something in Python, remember that it's interpreted, or partially-compiled at best. You cannot get optimum performance when each line has to be reinterpreted at run time, even given good intermediate code.

Upvotes: 0

Related Questions