Reputation: 7231
I recently compared the processing speeds of []
and list()
and was surprised to discover that []
runs more than three times faster than list()
. I ran the same test with {}
and dict()
and the results were practically identical: []
and {}
both took around 0.128sec / million cycles, while list()
and dict()
took roughly 0.428sec / million cycles each.
Why is this? Do []
and {}
(and probably ()
and ''
, too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list()
, dict()
, tuple()
, str()
) fully go about creating an object, whether or not they actually have elements?
I have no idea how these two methods differ but I'd love to find out. I couldn't find an answer in the docs or on SO, and searching for empty brackets turned out to be more problematic than I'd expected.
I got my timing results by calling timeit.timeit("[]")
and timeit.timeit("list()")
, and timeit.timeit("{}")
and timeit.timeit("dict()")
, to compare lists and dictionaries, respectively. I'm running Python 2.7.9.
I recently discovered "https://stackoverflow.com/questions/18123965/why-if-true-is-slower-than-if-1" that compares the performance of if True
to if 1
and seems to touch on a similar literal-versus-global scenario; perhaps it's worth considering as well.
Upvotes: 800
Views: 90216
Reputation: 226624
Both []
and list()
run the same underlying code to create a list. Ultimately they invoke the C function, PyList_New()
.
They differ in calling overhead. In general, built-in syntax such as []
is more direct than calling a type constructor like list()
. The latter involves a lookup to __builtins__
to find list
, building an argument tuple, calling the function, having the argument tuple parsed, handling cases like an iterable input, etc.
Considerable work is being done to optimize the Python interpreter. With builtins caching, call site caching, and work being done to implement JIT compilation, the differences should narrow. Possibly at some point in the future, the two will be almost identical.
Some small difference may remain due to intrinsic semantic differences.
The list()
form with always have to make some check to make sure the builting function has not been shadowed in globals or locals. In contrast, []
will always make a real list.
The PyPy interpreter shows what is possible with a tracing JIT.
% pypy3 -m timeit '[]'
WARNING: timeit is a very unreliable tool. use pyperf or something else for real measurements
pypy3 -m pip install pyperf
pypy3 -m pyperf timeit '[]'
------------------------------------------------------------
500000000 loops, average of 7: 0.586 +- 0.00304 nsec per loop (using standard deviation)
raymond@Raymonds-MacBook-Pro ~ % pypy3 -m timeit 'list()'
WARNING: timeit is a very unreliable tool. use pyperf or something else for real measurements
pypy3 -m pip install pyperf
pypy3 -m pyperf timeit 'list()'
------------------------------------------------------------
500000000 loops, average of 7: 0.582 +- 0.000292 nsec per loop (using standard deviation)
As you can see, with top notch JIT compilation, the two techniques are almost identical.
Upvotes: 1
Reputation: 74675
list()
requires a global lookup and a function call but []
compiles to a single instruction. See:
Python 2.7.3
>>> import dis
>>> dis.dis(lambda: list())
1 0 LOAD_GLOBAL 0 (list)
3 CALL_FUNCTION 0
6 RETURN_VALUE
>>> dis.dis(lambda: [])
1 0 BUILD_LIST 0
3 RETURN_VALUE
Upvotes: 169
Reputation: 1124298
Because []
and {}
are literal syntax. Python can create bytecode just to create the list or dictionary objects:
>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
1 0 BUILD_LIST 0
3 RETURN_VALUE
>>> dis.dis(compile('{}', '', 'eval'))
1 0 BUILD_MAP 0
3 RETURN_VALUE
list()
and dict()
are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.
For the empty case, that means you have at the very least a LOAD_NAME
(which has to search through the global namespace as well as the builtins
module) followed by a CALL_FUNCTION
, which has to preserve the current frame:
>>> dis.dis(compile('list()', '', 'eval'))
1 0 LOAD_NAME 0 (list)
3 CALL_FUNCTION 0
6 RETURN_VALUE
>>> dis.dis(compile('dict()', '', 'eval'))
1 0 LOAD_NAME 0 (dict)
3 CALL_FUNCTION 0
6 RETURN_VALUE
You can time the name lookup separately with timeit
:
>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119
The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:
>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125
So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39
seconds per 10 million calls.
You can avoid the global lookup cost by aliasing the global names as locals (using a timeit
setup, everything you bind to a name is a local):
>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137
but you never can overcome that CALL_FUNCTION
cost.
Upvotes: 847
Reputation: 395693
Why is
[]
faster thanlist()
?
The biggest reason is that Python treats list()
just like a user-defined function, which means you can intercept it by aliasing something else to list
and do something different (like use your own subclassed list or perhaps a deque).
It immediately creates a new instance of a builtin list with []
.
My explanation seeks to give you the intuition for this.
[]
is commonly known as literal syntax.
In the grammar, this is referred to as a "list display". From the docs:
A list display is a possibly empty series of expressions enclosed in square brackets:
list_display ::= "[" [starred_list | comprehension] "]"
A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension.
In short, this means that a builtin object of type list
is created.
There is no circumventing this - which means Python can do it as quickly as it may.
On the other hand, list()
can be intercepted from creating a builtin list
using the builtin list constructor.
For example, say we want our lists to be created noisily:
class List(list):
def __init__(self, iterable=None):
if iterable is None:
super().__init__()
else:
super().__init__(iterable)
print('List initialized.')
We could then intercept the name list
on the module level global scope, and then when we create a list
, we actually create our subtyped list:
>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>
Similarly we could remove it from the global namespace
del list
and put it in the builtin namespace:
import builtins
builtins.list = List
And now:
>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>
And note that the list display creates a list unconditionally:
>>> list_1 = []
>>> type(list_1)
<class 'list'>
We probably only do this temporarily, so lets undo our changes - first remove the new List
object from the builtins:
>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined
Oh, no, we lost track of the original.
Not to worry, we can still get list
- it's the type of a list literal:
>>> builtins.list = type([])
>>> list()
[]
So...
Why is
[]
faster thanlist()
?
As we've seen - we can overwrite list
- but we can't intercept the creation of the literal type. When we use list
we have to do the lookups to see if anything is there.
Then we have to call whatever callable we have looked up. From the grammar:
A call calls a callable object (e.g., a function) with a possibly empty series of arguments:
call ::= primary "(" [argument_list [","] | comprehension] ")"
We can see that it does the same thing for any name, not just list:
>>> import dis
>>> dis.dis('list()')
1 0 LOAD_NAME 0 (list)
2 CALL_FUNCTION 0
4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
1 0 LOAD_NAME 0 (doesnotexist)
2 CALL_FUNCTION 0
4 RETURN_VALUE
For []
there is no function call at the Python bytecode level:
>>> dis.dis('[]')
1 0 BUILD_LIST 0
2 RETURN_VALUE
It simply goes straight to building the list without any lookups or calls at the bytecode level.
We have demonstrated that list
can be intercepted with user code using the scoping rules, and that list()
looks for a callable and then calls it.
Whereas []
is a list display, or a literal, and thus avoids the name lookup and function call.
Upvotes: 23
Reputation: 160637
The answers here are great, to the point and fully cover this question. I'll drop a further step down from byte-code for those interested. I'm using the most recent repo of CPython; older versions behave similar in this regard but slight changes might be in place.
Here's a break down of the execution for each of these, BUILD_LIST
for []
and CALL_FUNCTION
for list()
.
BUILD_LIST
instruction:You should just view the horror:
PyObject *list = PyList_New(oparg);
if (list == NULL)
goto error;
while (--oparg >= 0) {
PyObject *item = POP();
PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();
Terribly convoluted, I know. This is how simple it is:
PyList_New
(this mainly allocates the memory for a new list object), oparg
signalling the number of arguments on the stack. Straight to the point.if (list==NULL)
. PyList_SET_ITEM
(a macro).No wonder it is fast! It's custom-made for creating new lists, nothing else :-)
CALL_FUNCTION
instruction:Here's the first thing you see when you peek at the code handling CALL_FUNCTION
:
PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
goto error;
}
DISPATCH();
Looks pretty harmless, right? Well, no, unfortunately not, call_function
is not a straightforward guy that will call the function immediately, it can't. Instead, it grabs the object from the stack, grabs all arguments of the stack and then switches based on the type of the object; is it a:
PyCFunction_Type
? Nope, it is list
, list
isn't of type PyCFunction
PyMethodType
? Nope, see previous.PyFunctionType
? Nopee, see previous.We're calling the list
type, the argument passed in to call_function
is PyList_Type
. CPython now has to call a generic function to handle any callable objects named _PyObject_FastCallKeywords
, yay more function calls.
This function again makes some checks for certain function types (which I cannot understand why) and then, after creating a dict for kwargs if required, goes on to call _PyObject_FastCallDict
.
_PyObject_FastCallDict
finally gets us somewhere! After performing even more checks it grabs the tp_call
slot from the type
of the type
we've passed in, that is, it grabs type.tp_call
. It then proceeds to create a tuple out of of the arguments passed in with _PyStack_AsTuple
and, finally, a call can finally be made!
tp_call
, which matches type.__call__
takes over and finally creates the list object. It calls the lists __new__
which corresponds to PyType_GenericNew
and allocates memory for it with PyType_GenericAlloc
: This is actually the part where it catches up with PyList_New
, finally. All the previous are necessary to handle objects in a generic fashion.
In the end, type_call
calls list.__init__
and initializes the list with any available arguments, then we go on a returning back the way we came. :-)
Finally, remmeber the LOAD_NAME
, that's another guy that contributes here.
It's easy to see that, when dealing with our input, Python generally has to jump through hoops in order to actually find out the appropriate C
function to do the job. It doesn't have the curtesy of immediately calling it because it's dynamic, someone might mask list
(and boy do many people do) and another path must be taken.
This is where list()
loses much: The exploring Python needs to do to find out what the heck it should do.
Literal syntax, on the other hand, means exactly one thing; it cannot be changed and always behaves in a pre-determined way.
Footnote: All function names are subject to change from one release to the other. The point still stands and most likely will stand in any future versions, it's the dynamic look-up that slows things down.
Upvotes: 33
Reputation: 23500
Because list
is a function to convert say a string to a list object, while []
is used to create a list off the bat. Try this (might make more sense to you):
x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]
While
y = ["wham bam"]
>>> y
["wham bam"]
Gives you a actual list containing whatever you put in it.
Upvotes: 79