Reputation: 1420
Let's say we have a dict that will always have keys first_name and last_name but they may be equal to None.
{
'first_name': None,
'last_name': 'Bloggs'
}
We want to save the first name if it is passed in or save it as an empty string if None is passed in.
first_name = account['first_name'] if account['first_name'] else ""
vs
first_name = account['first_name'] or ""
Both of these work, however, what is the difference behind the scenes? Is one more efficient than the other?
Upvotes: 19
Views: 4579
Reputation: 73450
Owing to its greater flexibility, there is more going on behind the scenes in the first version. After all, a if b else c
is an expression with 3 possibly distinct input variables/expressions, while a or b
is binary. You can disassemble the expressions to get a better idea:
def a(x):
return x if x else ''
def b(x):
return x or ''
>>> import dis
>>> dis.dis(a)
2 0 LOAD_FAST 0 (x)
2 POP_JUMP_IF_FALSE 8
4 LOAD_FAST 0 (x)
6 RETURN_VALUE
>> 8 LOAD_CONST 1 ('')
10 RETURN_VALUE
>>> dis.dis(b)
2 0 LOAD_FAST 0 (x)
2 JUMP_IF_TRUE_OR_POP 6
4 LOAD_CONST 1 ('')
>> 6 RETURN_VALUE
Upvotes: 6
Reputation: 50076
TLDR: It does not matter. If you care about correctness, you should instead compare against None
.
account['first_name'] if account['first_name'] is not None else ""
There is a notable impact from whether account['first_name']
is mostly None
or an actual value - however, this is at the nanosecond scale. It is negligible unless run in a very tight loop.
If you seriously require better performance, you should consider using a JIT or static compiler such as PyPy, Cython, or similar.
Python makes many guarantees that what you write is what is executed. That means the a if a else b
case evaluates a
at most twice. In contrast, a or b
evaluates a
exactly once.
In their disassembly, you can see that the LOAD_NAME
, LOAD_CONST
and BINARY_SUBSCR
happen twice for the first case - but only if the value is true-ish. If it is false-ish, the number of lookups is the same!
dis.dis('''account['first_name'] if account['first_name'] else ""''')
1 0 LOAD_NAME 0 (account)
2 LOAD_CONST 0 ('first_name')
4 BINARY_SUBSCR
6 POP_JUMP_IF_FALSE 16
8 LOAD_NAME 0 (account)
10 LOAD_CONST 0 ('first_name')
12 BINARY_SUBSCR
14 RETURN_VALUE
>> 16 LOAD_CONST 1 ('')
18 RETURN_VALUE
dis.dis('''account['first_name'] or ""''')
1 0 LOAD_NAME 0 (account)
2 LOAD_CONST 0 ('first_name')
4 BINARY_SUBSCR
6 JUMP_IF_TRUE_OR_POP 10
8 LOAD_CONST 1 ('')
>> 10 RETURN_VALUE
Technically, the statements also perform a different check: boolean false-ness (POP_JUMP_IF_FALSE
) versus boolean truth-ness (JUMP_IF_TRUE_OR_POP
). Since this is a single operation, it is optimised inside the interpreter and the difference is negligible.
For builtin types, you can generally assume that operations are "fast" - meaning that any non-trivial control flow takes significantly more time. Unless you run this in a tight loop over thousands of accounts, it will not have a notable impact.
While in your case it does not make an observable difference, it is usually better to explicitly test is not None
. This lets you distinguish between None
and other false-ish values, such as False
, []
or ""
, that may be valid.
account['first_name'] if account['first_name'] is not None else ""
Strictly speaking, it is the least efficient. On top of the added lookup, there is an additional lookup for None
and comparison for is not
.
dis.dis('''account['first_name'] if account['first_name'] is not None else ""''')
1 0 LOAD_NAME 0 (account)
2 LOAD_CONST 0 ('first_name')
4 BINARY_SUBSCR
6 LOAD_CONST 1 (None)
8 COMPARE_OP 9 (is not)
10 POP_JUMP_IF_FALSE 20
12 LOAD_NAME 0 (account)
14 LOAD_CONST 0 ('first_name')
16 BINARY_SUBSCR
18 RETURN_VALUE
>> 20 LOAD_CONST 2 ('')
22 RETURN_VALUE
Note that this test can actually be faster. An is not None
test compares for identity - that is builtin pointer comparison. Especially for custom types, this is faster than looking up and evaluating a custom __bool__
or even __len__
method.
In practice, the added lookup will not have a noticeable performance difference. It is up to you whether you prefer the shorter a or b
or the more robust a if a is not None else b
. Using a if a else b
gets you neither terseness nor correctness, so it should be avoided.
Here are the numbers from Python 3.6.4, perf timeit
:
# a is None
a or b | 41.4 ns +- 2.1 ns
a if a else b | 41.4 ns +- 2.4 ns
a if a is not None else b | 50.5 ns +- 4.4 ns
# a is not None
a or b | 41.0 ns +- 2.1 ns
a if a else b | 69.9 ns +- 5.0 ns
a if a is not None else b | 70.2 ns +- 5.4 ns
As you can see, there is an impact from the value of a
- if you care about tens of nanoseconds. The terser statement with fewer underlying instructions is faster, and more importantly stable. There is no significant penalty for the added is not None
check.
Either way, if you care about performance - do not optimise for CPython! If you need speed, taking a JIT/static compiler gives significantly more gain. However, their optimisations make instruction counts as performance metrics misleading.
For pure-Python code, as in your case, the PyPy interpreter is an obvious choice. On top of being faster in general, it seems to optimise the is not None
test. Here are the numbers from from PyPy 5.8.0-beta0, perf timeit
:
# a is None
a or b | 10.5 ns +- 0.7 ns
a if a else b | 10.7 ns +- 0.8 ns
a if a is not None else b | 10.1 ns +- 0.8 ns
# a is not None
a or b | 11.2 ns +- 1.0 ns
a if a else b | 11.3 ns +- 1.0 ns
a if a is not None else b | 10.2 ns +- 0.6 ns
Bottom line, do not try to gain performance by optimising for byte code instructions. Even if you are sure that this is a bottleneck (by profiling your application), such optimisations are generally not worth it. A faster runtime gives significantly more gain, and may not even have the same penalties for byte code instructions.
Upvotes: 3
Reputation: 387547
result = value if value else ""
This is the ternary conditional operator and basically equivalent to the following if statement:
if value:
result = value
else:
result = ""
It is very explicit and allows you to describe exactly what condition you require. In this case, it just looks at the truthy value of value
, but you could easily expand this to make a strict test against None
for example:
result = value if value is not None else ""
This would for example retain falsy values like False
or 0
.
value or ""
This uses the boolean or
operator:
The expression
x or y
first evaluatesx
; ifx
is true, its value is returned; otherwise,y
is evaluated and the resulting value is returned.
So this is basically a way to get the first truthy value (defaulting to the right operand). So this does the same as the value if value else ""
. Unless the conditional operator, it does not support other checks though, so you can only check for truthiness here.
In your case, where you want to just check against None
and fall back to an empty string, there’s no difference at all. Just choose what is most understandable to you. From a “pythonic” point of view, one would probably prefer the or
operator, as that’s also a bit shorter.
From a performance standpoint, the conditional operator is slightly more expensive in this case, as this needs to evaluate the dictionary access twice. In practice that won’t be noticeable though, especially not for a dictionary access.
If you do believe that this could have an impact on your application performance, then you should not believe in numbers you get from isolated benchmarks of a single statement; instead, you should profile your application and then try to identify bottlenecks which you can then improve on. I’ll guarantee you that it will be a long way before a second dictionary access will have any impact.
So yes, you can totally ignore the performance arguments for this. Just choose whatever you prefer, what makes the most sense to you. Also consider whether you just want a truthiness check, or whether a strict check against None
would be better.
Upvotes: 1
Reputation: 394835
What is the difference between the two following expressions?
first_name = account['first_name'] if account['first_name'] else ""
vs
first_name = account['first_name'] or ""
The primary difference is that the first, in Python, is the conditional expression,
The expression
x if C else y
first evaluates the condition,C
rather thanx
. IfC
is true,x
is evaluated and its value is returned; otherwise,y
is evaluated and its value is returned.
while the second uses the boolean operation:
The expression
x or y
first evaluatesx
; ifx
is true, its value is returned; otherwise,y
is evaluated and the resulting value is returned.
Note that the first may require two key lookups versus the second, which only requires one key lookup.
This lookup is called subscript notation:
name[subscript_argument]
Subscript notation exercises the __getitem__
method of the object referenced by name
.
It requires both the name and the subscript argument to be loaded.
Now, in the context of the question, if it tests as True
in a boolean context (which a non-empty string does, but None
does not) it will require a second (redundant) loading of both the dictionary and the key for the conditional expression, while simply returning the first lookup for the boolean or
operation.
Therefore I would expect the second, the boolean operation, to be slightly more efficient in cases where the value is not None
.
Others have compared the bytecode generated by both expressions.
However, the AST represents the first breakdown of the language as parsed by the interpreter.
The following AST demonstrates that the second lookup likely involves more work (note I have formatted the output for easier parsing):
>>> print(ast.dump(ast.parse("account['first_name'] if account['first_name'] else ''").body[0]))
Expr(
value=IfExp(
test=Subscript(value=Name(id='account', ctx=Load()),
slice=Index(value=Str(s='first_name')), ctx=Load()),
body=Subscript(value=Name(id='account', ctx=Load()),
slice=Index(value=Str(s='first_name')), ctx=Load()),
orelse=Str(s='')
))
versus
>>> print(ast.dump(ast.parse("account['first_name'] or ''").body[0]))
Expr(
value=BoolOp(
op=Or(),
values=[
Subscript(value=Name(id='account', ctx=Load()),
slice=Index(value=Str(s='first_name')), ctx=Load()),
Str(s='')]
)
)
Here we see that the bytecode for the conditional expression is much longer. This usually bodes poorly for relative performance in my experience.
>>> import dis
>>> dis.dis("d['name'] if d['name'] else ''")
1 0 LOAD_NAME 0 (d)
2 LOAD_CONST 0 ('name')
4 BINARY_SUBSCR
6 POP_JUMP_IF_FALSE 16
8 LOAD_NAME 0 (d)
10 LOAD_CONST 0 ('name')
12 BINARY_SUBSCR
14 RETURN_VALUE
>> 16 LOAD_CONST 1 ('')
18 RETURN_VALUE
For the boolean operation, it's almost half as long:
>>> dis.dis("d['name'] or ''")
1 0 LOAD_NAME 0 (d)
2 LOAD_CONST 0 ('name')
4 BINARY_SUBSCR
6 JUMP_IF_TRUE_OR_POP 10
8 LOAD_CONST 1 ('')
>> 10 RETURN_VALUE
Here I would expect the performance to be much quicker relative to the other.
Therefore, let's see if there's much difference in performance then.
Performance is not very important here, but sometimes I have to see for myself:
def cond(name=False):
d = {'name': 'thename' if name else None}
return lambda: d['name'] if d['name'] else ''
def bool_op(name=False):
d = {'name': 'thename' if name else None}
return lambda: d['name'] or ''
We see that when the name is in the dictionary, the boolean operation is about 10% faster than the conditional.
>>> min(timeit.repeat(cond(name=True), repeat=10))
0.11814919696189463
>>> min(timeit.repeat(bool_op(name=True), repeat=10))
0.10678509017452598
However, when the name is not in the dictionary, we see that there is almost no difference:
>>> min(timeit.repeat(cond(name=False), repeat=10))
0.10031125508248806
>>> min(timeit.repeat(bool_op(name=False), repeat=10))
0.10030031995847821
In general, I would prefer the or
boolean operation to the conditional expression - with the following caveats:
None
.In the case where either the above is not true, I would prefer the following for correctness:
first_name = account['first_name']
if first_name is None:
first_name = ''
The upsides are that
is None
is quite fast,This should also not be any less performant:
def correct(name=False):
d = {'name': 'thename' if name else None}
def _correct():
first_name = d['name']
if first_name is None:
first_name = ''
return _correct
We see that we get quite competitive performance when the key is there:
>>> min(timeit.repeat(correct(name=True), repeat=10))
0.10948465298861265
>>> min(timeit.repeat(cond(name=True), repeat=10))
0.11814919696189463
>>> min(timeit.repeat(bool_op(name=True), repeat=10))
0.10678509017452598
when the key is not in the dictionary, it is not quite as good though:
>>> min(timeit.repeat(correct(name=False), repeat=10))
0.11776355793699622
>>> min(timeit.repeat(cond(name=False), repeat=10))
0.10031125508248806
>>> min(timeit.repeat(bool_op(name=False), repeat=10))
0.10030031995847821
The difference between the conditional expression and the boolean operation is two versus one lookups respectively on a True
condition, making the boolean operation more performant.
For correctness's sake, however, do the lookup one time, check for identity to None
with is None
, and then reassign to the empty string in that case.
Upvotes: 9
Reputation: 2041
I know this doesn't answer your question about efficiency or the difference behind the scenes, but I'd like to point out that I think the following code is preferable:
first_name = account.get('first_name') or ''
This way you don't have to access the account['first_name']
twice.
Another side effect of this solution (obviously it depends if you want this behavior or not) is you'll never get a KeyError
, even if first_name
is not in the account
dict. Obviously if you prefer to see the KeyError that’s fine too.
The documentation for dict
's get
is here: https://docs.python.org/3/library/stdtypes.html#dict.get
Upvotes: 1
Reputation: 524
For your specific case, the boolean or
operator looks more pythonic, and also a very simple benchmark shows that it is slightly more efficient:
import timeit
setup = "account = {'first_name': None, 'last_name': 'Bloggs'}"
statements = {
'ternary conditional operator': "first_name = account['first_name'] if account['first_name'] else ''",
'boolean or operator': "first_name = account['first_name'] or ''",
}
for label, statement in statements.items():
elapsed_best = min(timeit.repeat(statement, setup, number=1000000, repeat=10))
print('{}: {:.3} s'.format(label, elapsed_best))
Output:
ternary conditional operator: 0.0303 s
boolean or operator: 0.0275 s
Taking into account that the numbers above are the total execution times in seconds (1000000 evaluations per each statement), in practice, there is no significant difference in efficiency at all.
Upvotes: 1