Uknowho
Uknowho

Reputation: 397

Iterate dictionary with eval strange behavior

I have this python function:

def evaluate_corr_level(corr):

    corr_table= {
    eval("corr == 0"):"No correlation", 
    eval("corr > 0 and corr <= 0.3"):"Weak correlation", 
    eval("corr > 0.3 and corr <= 0.7"): "Moderate correlation",
    eval("corr > 0.7"): "Strong correlation",
    eval("corr < 0 and corr >= -0.3"): "Weak inverse correlation",
    eval("corr < 0.3 and corr >= -0.7"): "Moderate inverese correlation",
    eval("corr < -0.7"): "Strong inverse correlation"
    }

    for k, val in corr_table.items():
        print(k,val)

If I pass 0.5 as a parameter I get printed:

False Strong inverse correlation
True Moderate correlation

Which is not what I expected. What's wrong with the code? I assume it's some string to float issue (not very familiar with eval())

Thanks!

Upvotes: 1

Views: 81

Answers (4)

ShadowRanger
ShadowRanger

Reputation: 155497

Others have explained why this doesn't work as you expect. I'll explain why what you're trying to do is wrong and how to fix it. The main problem here is that getting switch-like performance in Python only works if the dict can be defined ahead of time; if you define it on every function call, you do all the work of an entire if/elif/else chain without the ability to short-circuit, and with unnecessary temporaries, making it worse than just the naïve approach (which I slightly optimized by reordering to get only 0-1 tests per condition, not 1-2):

if corr < -0.7:
    x = "Strong inverse correlation"
elif corr < -0.3:
    x = "Moderate inverse correlation"
elif corr < 0:
    x = "Weak inverse correlation"
elif corr == 0:
    x = "No correlation"
elif corr <= 0.3:
    x = "Weak correlation"
elif corr <= 0.7:
    x = "Moderate correlation"
else:
    x = "Strong correlation"

If you want to optimize this for switch like behavior, you need to define a dict outside the function once, in a way that you can reuse it every time (so you only pay a O(1) lookup cost, not a O(n) dict building cost as well). It's tricky to do this with floats and probably not worth it, though in this particular case it's technically doable:

import math

rounding_table = (math.floor, math.ceil)  # Look up with val >= 0, so floor used for negative, ceil for others

corr_table = {
    **dict.fromkeys(range(-7, -3), "Moderate inverse correlation"),
    **dict.fromkeys(range(-3, 0), "Weak inverse correlation"),
    0: "No correlation",
    **dict.fromkeys(range(1, 4), "Weak correlation"),
    **dict.fromkeys(range(4, 8), "Moderate correlation"),
    }

def evaluate_corr_level(corr):
    # Multiply by 10 and round away from 0 to get normalized integer form matching our table's keys
    norm_corr_key = rounding_table[corr >= 0](corr * 10)

    try:
        # Look up anything that rounded to a value in the table cheaply
        return corr_table[norm_corr_key]
    except KeyError:
        # If it's outside the range -0.7 - 0.7, it's a strong correlation, sign
        # tells us if it's inverse or not
        return "Strong inverse correlation" if corr < 0 else "Strong correlation"

In theory, this replaces 1-6 conditional tests with 2-3 (for sign, the implicit test for "is it in the table" and the possible third test for sign if it isn't). But both table lookups, the multiplication by 10, and the rounding step are, while relatively fixed cost, largely more expensive than simple numeric tests.

To be clear, this is almost certainly not justified; it probably won't produce performance improvements in real code unless the number of ranges is larger (if the if/elif/else chain has 30 unique cases, it might help, at the expense of a potentially larger lookup table, but for seven, probably not). And it only works because there was a semi-reasonable way to normalize the float inputs to a small range of ints; if the ranges involved tests for values out to the 10th decimal place, the size of the required dict would be ridiculous.

Alternatively, for the float case, you could use a range-based pre-built lookup table in sorted order and search it in O(log n) time with the bisect module:

import bisect
import math

# Make table that will consistently follow rules from tests when used with
# bisect.bisect, including distinguishing 0 from just above 0
# Must construct it outside function to avoid rebuilding it every time (we need math.nextafter to get the system's next larger value, so non-literals get involved)
corr_table = (-0.7, -0.3, 0, math.nextafter(0, math.inf), math.nextafter(0.3, math.inf), math.nextafter(0.7, math.inf))

def evaluate_corr_level(corr):
    # Get index into table of strings in O(log n) time
    index = bisect.bisect(corr_table, corr)

    # This table is a tuple of constant literals, so we can use it inline,
    # and Python uses a constant tuple without rebuilding it
    # Actual lookup is O(1)
    return ("Strong inverse correlation", "Moderate inverse correlation", "Weak inverse correlation", "No correlation", "Weak correlation", "Moderate correlation", "Strong correlation")[index]

That's more practical for arbitrary floats. It may not save anything in this case (again, seven distinct cases isn't that much) but it's likely to lose by less since it doesn't need to normalize the input, and therefore has lower fixed overhead and might win sooner than the dict table (while also avoiding explosions in memory usage that the dict requires for more and more precise floats).

Upvotes: 3

joostblack
joostblack

Reputation: 2525

Using numpy as a replacement of a switch case:

import numpy as np

def evaluate_corr_level(corr,bins):
    statements = ("Strong inverse correlation","Moderate inverse correlation",
                  "Weak inverse correlation","No correlation","weak correlation",
                  "Moderate correlation","Strong correlation"
                  )
    return statements[np.searchsorted(bins, corr,side='right')]

eps=np.finfo(np.float).eps
bins_corr = np.array([-0.7, -0.3, 0,eps, 0.3+eps, 0.7+eps])

print(evaluate_corr_level(0.71,bins_corr))

Upvotes: 1

Reid Moffat
Reid Moffat

Reputation: 371

A dictionary can only have one of each distinct key. Since you are using eval() for the keys, it will either return True or False (there can only be a maximum of two keys in this dictionary)

Because of this, eval() will override the previous same key. It is always printing out the last two key:value pairs because they have the keys True and False, which override all previous True and False keys

Upvotes: 1

Ngoc N. Tran
Ngoc N. Tran

Reputation: 1068

When you defined your dictionary, you are evaluating your conditions to be Trues and Falses to be the keys. However, in a dictionary, keys are unique, so all declarations of False/True but the last stays (they get overwritten).

Upvotes: 2

Related Questions