Chris James
Chris James

Reputation: 161

Most efficient way of creating a tuple from list of tuples

I am currently using a for loop with enumerate to extract from a list of tuples below:

[(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]

What i want is to end up with the following tuple ('handle', 'Firstname', 'Surname', 'Callname', 'Gender')

What would be the most efficient way of accomplishing this without enumerating through them and creating a new tuple or is this the only way?

Upvotes: 1

Views: 382

Answers (3)

Martijn Pieters
Martijn Pieters

Reputation: 1123032

Create a new tuple by enumerating through them:

tuple(t[1] for t in inputlist)

This uses a generator expression to pass each second element from the tuples in inputlist to the tuple() constructor.

If you just need a sequence and a list would do, then use a list comprehension:

[t[1] for t in inputlist]

Lists fit arbitrary-length, ordered, homogenous data sets (such as you have here) better than do tuples, see What's the difference between lists and tuples?

If raw speed is required and readability can be de-emphasised, use map() and operator.itemgetter() to move iteration and extraction to optimised C code:

from operator import itemgetter

labels_tup = tuple(map(itemgetter(1), inputlist))
labels_list = list(map(itemgetter(1), inputlist))

However, I'd avoid doing this unless extracting a bunch of strings out of a list of tuples is on a critical path and / or repeated a lot. Readability counts!

without enumerating through them and creating a new tuple

You can't avoid this. You a) want one element from each tuple in a sequence, and b) need a tuple object as output, an immutable type. While you could write 5 separate statements indexing into inputlist to access each value, doing so would not be efficient, creates needlessly repeated code, and would break the moment your input doesn't have exactly 5 elements.

Demo:

>>> inputlist = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]
>>> tuple(t[1] for t in inputlist)
('handle', 'Firstname', 'Surname', 'Callname', 'Gender')
>>> [t[1] for t in inputlist]
['handle', 'Firstname', 'Surname', 'Callname', 'Gender']

Upvotes: 5

jamylak
jamylak

Reputation: 133604

>>> from operator import itemgetter
>>> data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]
>>> tuple(map(itemgetter(1), data))
('handle', 'Firstname', 'Surname', 'Callname', 'Gender')

This seems to be the fastest in raw speed (only slightly however - since it keeps everything in C as much as possible), and I also do like the look of this as well. Of course you are still looping through the elements however.

Timings:

$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]; from operator import itemgetter;" "tuple(map(itemgetter(1), data))"
500000 loops, best of 5: 477 nsec per loop
$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]; from operator import itemgetter;" "tuple(t[1] for t in data)"
500000 loops, best of 5: 566 nsec per loop
$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]*1000; from operator import itemgetter;" "tuple(map(itemgetter(1), data))"
2000 loops, best of 5: 146 usec per loop
$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]*1000; from operator import itemgetter;" "tuple(t[1] for t in data)"
1000 loops, best of 5: 212 usec per loop

Upvotes: 2

abhiarora
abhiarora

Reputation: 10440

You are looking for generator expression.

print(tuple(i[1] for i in inputlist))

Or

t = tuple(i[1] for i in inputlist)
print(t)

Outputs:

('handle', 'Firstname', 'Surname', 'Callname', 'Gender')

A possible solution with for loop (Not recommended):

li = []
for i in inputlist:
    li.append(i[1])
print(tuple(li))

What would be the most efficient way of accomplishing this without enumerating through them and creating a new tuple or is this the only way?

I am not sure why you want to avoid creating tuple but you don't need enumerate. May be the example given below can help:

def getElement(ndx):
    return inputlist[ndx][1]

# Get Second Element
print(getElement(2))

Upvotes: 3

Related Questions