OliasailO
OliasailO

Reputation: 512

A Faster Nested Tuple to List and Back

I'm trying to perform tuple to list and list to tuple conversions on nested sequences of unknown depth and shape. The calls are being made hundreds of thousands of times, which is why I'm trying to squeeze out as much speed as possible.

Any help is much appreciated.

Here's what I have so far...

def listify(self, seq, was, toBe):
  temp = []
  a = temp.append
  for g in seq:
    if type(g) == was:
      a(self.listify(g, was, toBe))
    else:
      a(g)
  return toBe(temp)

And a call for tuple to list would look like this:

self.listify((...), tuple, list)

Edit: Yeah, I totally missed both the enumerate (from an old implementation) and forgot to type the else part.

Thanks for the help both of you. I'll probably go with the coroutines.

Upvotes: 3

Views: 1525

Answers (3)

saeedgnu
saeedgnu

Reputation: 4376

Since the answers above do not deal with tuples or lists inside dictionary values, I post my own code:

def tuple2list(data):
    if isinstance(data, dict):
        return {
            key: tuple2list(value)
            for key, value in data.items()
        }
    elif isinstance(data, (list, tuple)):
        return [
            tuple2list(item)
            for item in data
        ]
    return data

def list2tuple(data):
    if isinstance(data, dict):
        return {
            key: list2tuple(value)
            for key, value in data.items()
        }
    elif isinstance(data, (list, tuple)):
        return tuple(
            list2tuple(item)
            for item in data
        )
    return data

Upvotes: 0

David
David

Reputation: 1411

I have been working with coroutines quiet a lot lately. The advantage would be that you reduce the overhead of the method calls. Sending a new value into a coroutine is faster than calling a function. While you can not make a recursive coroutine, it will throw a ValueError: generator already executing but you could make a pool of coroutine workers - you need one worker for every level of the tree. I have made some test code that works, but have not looked at the timing issues yet.

def coroutine(func):
    """ A helper function decorator from Beazley"""
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        g.next()
        return g
    return start

@coroutine
def cotuple2list():
    """This does the work"""
    result = None
    while True:
        (tup, co_pool) = (yield result)
        result = list(tup)
        # I don't like using append. So I am changing the data in place.
        for (i,x) in enumerate(result):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,tuple):
                result[i] = co_pool[0].send((x, co_pool[1:]))


@coroutine
def colist2tuple():
    """This does the work"""
    result = None
    while True:
        (lst, co_pool) = (yield result)
        # I don't like using append so I am changing the data in place...
        for (i,x) in enumerate(lst):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,list):
                lst[i] = co_pool[0].send((x, co_pool[1:]))
        result = tuple(lst)

Pure python alternative from HYRY's post:

def list2tuple(a):
    return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))
def tuple2list(a):
    return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

Make a pool of coroutines - this is a hack of a pool, but it works:

# Make Coroutine Pools
colist2tuple_pool = [colist2tuple() for i in xrange(20) ]
cotuple2list_pool = [cotuple2list() for i in xrange(20) ]

Now do some timing - comparing to :

def make_test(m, n):
    # Test data function taken from HYRY's post!
    return [[range(m), make_test(m, n-1)] for i in range(n)]
import timeit
t = make_test(20, 8)
%timeit list2tuple(t)
%timeit colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))

Results - notice the 'u' next to the 's' in the second line :-)

1 loops, best of 3: 1.32 s per loop
1 loops, best of 3: 4.05 us per loop

Really seems too fast to believe. Anybody know if timeit works with coroutines? Here is the old fashioned way:

tic = time.time()
t1 = colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))
toc = time.time()
print toc - tic

result:

0.000446081161499

Newer versions of Ipython and %timit give a warning:

The slowest run took 9.04 times longer than the fastest. This could
mean that an intermediate result is being cached 1000000 loops, best of 3: 317 ns per loop

After some further investigation, python generators are not magic and send is still a function call. The reason my generator based method appeared to be faster is that I was doing an inplace operation on the lists - which resulted in fewer function calls.

I wrote all this out with lots of additional detail in a recent talk.

Hope this helps someone looking to play with generators.

Upvotes: 6

HYRY
HYRY

Reputation: 97331

Define two function separately:

def list2tuple(a):
    return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))

def tuple2list(a):
    return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

some test:

t = [1, 2, [3, 4], [5, [7, 8]], 9]
t2 = list2tuple(t)
t3 = tuple2list(t2)
print t2
print t3

The results:

(1, 2, (3, 4), (5, (7, 8)), 9)
[1, 2, [3, 4], [5, [7, 8]], 9]

Edit: for fast version:

def list2tuple2(a, tuple=tuple, type=type, list=list):
    return tuple([list2tuple2(x) if type(x)==list else x for x in a])

def tuple2list2(a, tuple=tuple, type=type):
    return [tuple2list2(x) if type(x)==tuple else x for x in a]

To compare I also include cython version:

%%cython

def list2tuple3(a):
    return tuple([list2tuple3(x) if type(x)==list else x for x in a])

def tuple2list3(a):
    return [tuple2list3(x) if type(x)==tuple else x for x in a]

Create some nested list:

def make_test(m, n):
    return [[range(m), make_test(m, n-1)] for i in range(n)]

t = make_test(20, 8)
t2 = list2tuple2(t)

Then compare the speed:

%timeit listify(t, list, tuple)
%timeit listify(t2, tuple, list)
%timeit list2tuple(t)
%timeit tuple2list(t2)
%timeit list2tuple2(t)
%timeit tuple2list2(t2)
%timeit list2tuple3(t)
%timeit tuple2list3(t2)

The result is:

listify
1 loops, best of 3: 828 ms per loop
1 loops, best of 3: 912 ms per loop

list2tuple generator expression version
1 loops, best of 3: 1.49 s per loop
1 loops, best of 3: 1.67 s per loop

list2tuple2 list comprehension with local cache
1 loops, best of 3: 623 ms per loop
1 loops, best of 3: 566 ms per loop

list2tuple3 cython
1 loops, best of 3: 212 ms per loop
10 loops, best of 3: 232 ms per loop

Upvotes: 3

Related Questions