Reputation: 397
I have this python function:
def evaluate_corr_level(corr):
corr_table= {
eval("corr == 0"):"No correlation",
eval("corr > 0 and corr <= 0.3"):"Weak correlation",
eval("corr > 0.3 and corr <= 0.7"): "Moderate correlation",
eval("corr > 0.7"): "Strong correlation",
eval("corr < 0 and corr >= -0.3"): "Weak inverse correlation",
eval("corr < 0.3 and corr >= -0.7"): "Moderate inverese correlation",
eval("corr < -0.7"): "Strong inverse correlation"
}
for k, val in corr_table.items():
print(k,val)
If I pass 0.5 as a parameter I get printed:
False Strong inverse correlation
True Moderate correlation
Which is not what I expected. What's wrong with the code? I assume it's some string to float issue (not very familiar with eval())
Thanks!
Upvotes: 1
Views: 81
Reputation: 155497
Others have explained why this doesn't work as you expect. I'll explain why what you're trying to do is wrong and how to fix it. The main problem here is that getting switch
-like performance in Python only works if the dict
can be defined ahead of time; if you define it on every function call, you do all the work of an entire if/elif/else
chain without the ability to short-circuit, and with unnecessary temporaries, making it worse than just the naïve approach (which I slightly optimized by reordering to get only 0-1 tests per condition, not 1-2):
if corr < -0.7:
x = "Strong inverse correlation"
elif corr < -0.3:
x = "Moderate inverse correlation"
elif corr < 0:
x = "Weak inverse correlation"
elif corr == 0:
x = "No correlation"
elif corr <= 0.3:
x = "Weak correlation"
elif corr <= 0.7:
x = "Moderate correlation"
else:
x = "Strong correlation"
If you want to optimize this for switch
like behavior, you need to define a dict
outside the function once, in a way that you can reuse it every time (so you only pay a O(1)
lookup cost, not a O(n)
dict
building cost as well). It's tricky to do this with float
s and probably not worth it, though in this particular case it's technically doable:
import math
rounding_table = (math.floor, math.ceil) # Look up with val >= 0, so floor used for negative, ceil for others
corr_table = {
**dict.fromkeys(range(-7, -3), "Moderate inverse correlation"),
**dict.fromkeys(range(-3, 0), "Weak inverse correlation"),
0: "No correlation",
**dict.fromkeys(range(1, 4), "Weak correlation"),
**dict.fromkeys(range(4, 8), "Moderate correlation"),
}
def evaluate_corr_level(corr):
# Multiply by 10 and round away from 0 to get normalized integer form matching our table's keys
norm_corr_key = rounding_table[corr >= 0](corr * 10)
try:
# Look up anything that rounded to a value in the table cheaply
return corr_table[norm_corr_key]
except KeyError:
# If it's outside the range -0.7 - 0.7, it's a strong correlation, sign
# tells us if it's inverse or not
return "Strong inverse correlation" if corr < 0 else "Strong correlation"
In theory, this replaces 1-6 conditional tests with 2-3 (for sign, the implicit test for "is it in the table" and the possible third test for sign if it isn't). But both table lookups, the multiplication by 10, and the rounding step are, while relatively fixed cost, largely more expensive than simple numeric tests.
To be clear, this is almost certainly not justified; it probably won't produce performance improvements in real code unless the number of ranges is larger (if the if/elif/else
chain has 30 unique cases, it might help, at the expense of a potentially larger lookup table, but for seven, probably not). And it only works because there was a semi-reasonable way to normalize the float
inputs to a small range of int
s; if the ranges involved tests for values out to the 10th decimal place, the size of the required dict
would be ridiculous.
Alternatively, for the float
case, you could use a range-based pre-built lookup table in sorted order and search it in O(log n)
time with the bisect
module:
import bisect
import math
# Make table that will consistently follow rules from tests when used with
# bisect.bisect, including distinguishing 0 from just above 0
# Must construct it outside function to avoid rebuilding it every time (we need math.nextafter to get the system's next larger value, so non-literals get involved)
corr_table = (-0.7, -0.3, 0, math.nextafter(0, math.inf), math.nextafter(0.3, math.inf), math.nextafter(0.7, math.inf))
def evaluate_corr_level(corr):
# Get index into table of strings in O(log n) time
index = bisect.bisect(corr_table, corr)
# This table is a tuple of constant literals, so we can use it inline,
# and Python uses a constant tuple without rebuilding it
# Actual lookup is O(1)
return ("Strong inverse correlation", "Moderate inverse correlation", "Weak inverse correlation", "No correlation", "Weak correlation", "Moderate correlation", "Strong correlation")[index]
That's more practical for arbitrary floats. It may not save anything in this case (again, seven distinct cases isn't that much) but it's likely to lose by less since it doesn't need to normalize the input, and therefore has lower fixed overhead and might win sooner than the dict
table (while also avoiding explosions in memory usage that the dict
requires for more and more precise floats).
Upvotes: 3
Reputation: 2525
Using numpy as a replacement of a switch case:
import numpy as np
def evaluate_corr_level(corr,bins):
statements = ("Strong inverse correlation","Moderate inverse correlation",
"Weak inverse correlation","No correlation","weak correlation",
"Moderate correlation","Strong correlation"
)
return statements[np.searchsorted(bins, corr,side='right')]
eps=np.finfo(np.float).eps
bins_corr = np.array([-0.7, -0.3, 0,eps, 0.3+eps, 0.7+eps])
print(evaluate_corr_level(0.71,bins_corr))
Upvotes: 1
Reputation: 371
A dictionary can only have one of each distinct key. Since you are using eval() for the keys, it will either return True
or False
(there can only be a maximum of two keys in this dictionary)
Because of this, eval() will override the previous same key. It is always printing out the last two key:value pairs because they have the keys True
and False
, which override all previous True
and False
keys
Upvotes: 1
Reputation: 1068
When you defined your dictionary, you are evaluating your conditions to be True
s and False
s to be the keys. However, in a dictionary, keys are unique, so all declarations of False/True
but the last stays (they get overwritten).
Upvotes: 2