How can I refactor this recursive policy expression Strategy to parameterize its length?

Question

Context

Firstly, thanks for hypothesis. It's both extremely powerful and extremely useful!

I've written a hypothesis strategy to produce monotonic (ANDS and ORs) policy expressions of the form:

(A and (B or C))

This can be thought of as a tree structure, where A, B and C are attributes at the leaf nodes, whereas 'and' and 'or' are non-leaf nodes.

The strategy seems to generate expressions as desired.

>>> find(policy_expressions(), lambda x: len(x.split()) > 3)
'(A or (A or A))'

(Perhaps the statistical diversity of examples could be improved, but that is not the essence of this question).

Inequalities are valid too. For example:

(N or (WlIorO and (nX <= 55516 and e)))

I want to constrain or filter the examples so that I can generate policy expressions with a specified number of leaf nodes (i.e. attributes).

For a performance test, I've tried using data.draw() with filter something like this:

@given(data=data())
def test_keygen_encrypt_proxy_decrypt_decrypt_execution_time(data, n):
    """
    :param n: the input size n. Number of attributes or leaf nodes in policy tree.
    """

    policy_str = data.draw(strategy=policy_expressions().filter(lambda x: len(extract_attributes(group, x)) == n),
                           label="policy string")

Where extract_attributes() counts the number of leaf nodes in the expression and n is the desired number of leaves.

The problem with this solution is that when n > 16, hypothesis throws a:

hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of hypothesis test_keygen_encrypt_proxy_decrypt_decrypt_execution_time.

I want to generate valid policy expressions with 100s of leaf nodes.

Another drawback of that approach was that hypothesis reported HealthCheck.filter_too_much and HealthCheck.too_slow and the @settings got ugly.

I would rather have a parameter to say policy_expressions(leaf_nodes=4) to get an example like this:

(N or (WlIorO and (nX <= 55516 and e)))

I avoided that initially, because I'm not able to see how to do it with the recursive strategy code.

Question

Can you suggest a way to refactor this strategy so that it can be parameterized for number of leaf nodes?

Here's the strategy code (its open source in Charm Crypto anyway)

from hypothesis.strategies import text, composite, sampled_from, characters, one_of, integers


def policy_expressions():
    return one_of(attributes(), inequalities(), policy_expression())


@composite
def policy_expression(draw):
    left = draw(policy_expressions())
    right = draw(policy_expressions())
    gate = draw(gates())
    return u'(' + u' '.join((left, gate, right)) + u')'


def attributes():
    return text(min_size=1, alphabet=characters(whitelist_categories='L', max_codepoint=0x7e))


@composite
def inequalities(draw):
    attr = draw(attributes())
    oper = draw(inequality_operators())
    numb = draw(integers(min_value=1))
    return u' '.join((attr, oper, str(numb)))


def inequality_operators():
    return sampled_from((u'<', u'>', u'<=', u'>='))


def gates():
    return sampled_from((u'or', u'and'))


def assert_valid(policy_expression):
    assert policy_expression  # not empty
    assert policy_expression.count(u'(') == policy_expression.count(u')')

https://github.com/JHUISI/charm/blob/dev/charm/toolbox/policy_expression_spec.py

Lucas Wiman · Accepted Answer

I'd suggest explicitly building in the number of leaves into how the data is constructed, then passing in the number of leaves you want:

from hypothesis.strategies import text, composite, sampled_from, characters, one_of, integers


def policy_expressions_of_size(num_leaves):
    if num_leaves == 1:
        return attributes()
    elif num_leaves == 2:
        return one_of(inequalities(), policy_expression(num_leaves))
    else:
        return policy_expression(num_leaves)


policy_expressions = integers(min_value=1, max_value=500).flatmap(policy_expressions_of_size)


@composite
def policy_expression(draw, num_leaves):
    left_leaves = draw(integers(min_value=1, max_value=num_leaves - 1))
    right_leaves = num_leaves - left_leaves
    left = draw(policy_expressions_of_size(left_leaves))
    right = draw(policy_expressions_of_size(right_leaves))
    gate = draw(gates())
    return u'(' + u' '.join((left, gate, right)) + u')'


def attributes():
    return text(min_size=1, alphabet=characters(whitelist_categories='L', max_codepoint=0x7e))


@composite
def inequalities(draw):
    attr = draw(attributes())
    oper = draw(inequality_operators())
    numb = draw(integers(min_value=1))
    return u' '.join((attr, oper, str(numb)))


def inequality_operators():
    return sampled_from((u'<', u'>', u'<=', u'>='))


def gates():
    return sampled_from((u'or', u'and'))

Then you can pick exactly how large you want the policy expression to be:

>>> policy_expressions.example()
'((((((oOjFo or (((cH and (Q or (uO > 18 and byy))) and kS) or pqKUUZ > 74)) and (gi or mwsrU <= 4115)) and qLkVSTqXZxgScTj) and (vNJ > 969 and (Drwvh or (((xhmsWhHpc or hQSMnfgyiYnblLFJ) or sesfHbQ) and jt)))) or xS) and ((V and (mArqYR or qY)) or (((uVf and bbtKUCnecMKjRJD > 18944) and nerVkPSs < 29292) and (UlOJebfbgcJz or (bxfVfjgmfulSB > 71 or (jqGLlr or (zQqj and zqUGwc < 24845)))))))'
>>> 
>>> policy_expressions_of_size(1).example()
'Eo'
>>> 
>>> policy_expressions_of_size(2).example()
'KJAitOKC > 18179'
>>> policy_expressions_of_size(10).example()
'(((htjdVy or (((XTfZil or (rqZw and DEOeER)) and xGVsdeQJLTJxLsC < 388312303) or LxLfUPljUTH)) or (Kb or EoipoYzjncAGKTE)) or bc)'
>>> policy_expressions_of_size(100).example()
'(((((CxySeUrNW or bZG) or (gzSUGgTG and (((V or n) or wqA) or veuTEnjGKwIpkDDDBiQkMwsNbxrBv))) or (((SKgQSXtAg or ChCHcEsVavy) and (((Yxj and xcCX) or QrILGAWxVKXWRb > 98817811688973569232860005374239659122) or JD <= 28510)) and KhrGfZciz > 4057857855522854443)) and (ZMIzFELKAKDMrH and (((MOmAZ and J <= 22052) or (Scy >= 17563 and (VCS and ((FFLa and EtZvqwNymnZNnjlREM) or pU)))) or A))) and ((((kaYzzIXIu and (lwos and (vp and GqG))) and ((Nh and lb) or ((TbNZWYOpYmj and (AQs or w)) or NjFYLBr > 228431293))) or ((((FTSXkXGZyKXD or zXeVEqNgkyXI) or mNGI) or ((cGOGK or gjcI) and DQzYonXszfSrZMB)) and JI > 3802)) or (((jIREd and IVzFB >= 28149) and (UdCBg < 20 or (VSGxr or XBuiS <= 1615))) and (rE > 10511139808015932 and ((((((((W and u) or yslVZ) or (eVGlz < 7033 or UiE)) and ((trOmArBc and Zx) or mPKva)) or ((qqDmKUpAnW or yvSkhTgqXQaLnxL) or Z)) or snXcMDhhf) and ((Wu or XSjbKdsZqEiXXvOb) and (DNZg and qv >= 7503))) and ((rnffxTLThwvw >= 24460 and ((oO or y <= 24926) and (NjM and vEHukii))) or ((((BTdpW and rP) or (rjUylCZwJzGobXZR or MNoBdEEIuLbTRvZHMb < 7958346708112664935)) and ((YU or gY >= 15498) and (s and GnOydthO > 103))) or ((caumKPjp < 27 and OQoFXscbD) or ((qaxYwfnelmetYqHKnatQ or P) and (ixzsvX and mYROpqoHAqeEy))))))))))'

How can I refactor this recursive policy expression Strategy to parameterize its length?

Answers (1)

Related Questions