wwwilliam
wwwilliam

Reputation: 9592

What is the speed difference between Python's set() and set([])?

Is there a big difference in speed in these two code fragments?

1.

x = set( i for i in data )

versus:

2.

x = set( [ i for i in data ] )

I've seen people recommending set() instead of set([]); is this just a matter of style?

Upvotes: 1

Views: 366

Answers (2)

mgilson
mgilson

Reputation: 309841

The form

x = set(i for i in data)

is shorthand for:

x = set((i for i in data))

This creates a generator expression which evaluates lazily. Compared to:

x = set([i for i in data])

which creates an entire list before passing it to set


From a performance standpoint, generator expressions allow for short-circuiting in certain functions (all and any come to mind) and takes less memory as you don't need to store the extra list -- In some cases this can be very significant.

If you actually are going to iterate over the entire iterable data, and memory isn't a problem for you, I've found that typically the list-comprehension is slightly faster then the equivalent generator expression*.

temp $ python -m timeit 'set(i for i in "xyzzfoobarbaz")'
100000 loops, best of 3: 3.55 usec per loop
temp $ python -m timeit 'set([i for i in "xyzzfoobarbaz"])'
100000 loops, best of 3: 3.42 usec per loop

Note that if you're curious about speed -- Your fastest bet will probably be just:

x = set(data)

proof:

temp $ python -m timeit 'set("xyzzfoobarbaz")'
1000000 loops, best of 3: 1.83 usec per loop

*Cpython only -- I don't know how Jython or pypy optimize this stuff.

Upvotes: 6

The [] syntax creates a list, which is discarded immediatley after the set is created. So you are increasing the memory footprint of the program.

The generator syntax avoids that.

Upvotes: 3

Related Questions