ericgcc
ericgcc

Reputation: 11

How to use DeepDiff with custom_operators and iterable_compare_func altogether?

I have to use DeepDiff to compare two lists of dictionaries with metrics of some variables. The comparison should be done in such a way that if the new metrics are equal to or greater than the previous ones, no difference between the dictionaries should be shown, but if the new metrics are smaller, the difference should be indicated.

This is a sample of my lists:

old = [
    {
        'variable': 'location',
        'accuracy': 0.6338672768878718,
        'coverage': 0.9278131634819533,
    },
    {
        'variable': 'operating_name',
        'accuracy': 0.7156488549618321,
        'coverage': 0.16129032258064516,
    },
    {
        'variable': 'years_in_business',
        'accuracy': 0.8686224489795918,
        'coverage': 0.48590021691973967,
    },

]

new = [    
    {
        'variable': 'location',
        'accuracy': 0.6227561657767604,
        'coverage': 0.9267020523708422,
    },
    {
        'variable': 'operating_name',
        'accuracy': 0.8883720930232558,
        'coverage': 0.49710982658959535,
    },
    {
        'variable': 'years_in_business',
        'accuracy': 0.8564488549618321,
        'coverage': 0.4124206283185841,
    },
]

Using the following custom operator I have managed to perform the comparison as described without any problem:

class CloseToOrGreatherThan(BaseOperator):
    def __init__(self, types):
        super().__init__(types=types)

    def give_up_diffing(self, level, diff_instance) -> bool:
        new = level.t1
        old = level.t2

        new_acc = round(new["accuracy"] * 100, 1)
        base_acc = round(old["accuracy"] * 100, 1)
        acc_diff = isclose(new_acc, base_acc, abs_tol=0.9) or new_acc > base_acc # new >= old?

        new_cvr = round(new["coverage"] * 100, 1)
        base_cvr = round(old["coverage"] * 100, 1)
        cvr_diff = isclose(new_cvr, base_cvr, abs_tol=0.9) or new_cvr > base_cvr  # new >= old?

        # if either of the new accuracy or coverage is less than the old accuracy or coverage, mark the difference
        if not (acc_diff and cvr_diff):
            report = f"accuracy: {old['accuracy']} => {new['accuracy']}" if not acc_diff else ""
            report = " / ".join(s for s in [report, f"coverage: {old['coverage']} => {new['coverage']}"] if s) if not cvr_diff else report
            diff_instance.custom_report_result(old['variable'], level, report)
        return True

Executing DeepDiff like this:

DeepDiff(new, old, custom_operators=[CloseToOrGreatherThan(types=[dict])])

the output is as expected:

{'location': {'root[0]': 'accuracy: 0.6338672768878718 => 0.6227561657767604'}, 'years_in_business': {'root[2]': 'accuracy: 0.8686224489795918 => 0.8564488549618321 / coverage: 0.48590021691973967 => 0.4124206283185841'}}

However, the problem I have is that if the order of the variables in the lists of dictionaries is different, the comparison does not work well anymore. That is, if the lists are now like this:

old = [
    {
        'variable': 'location',
        'accuracy': 0.6338672768878718,
        'coverage': 0.9278131634819533,
    },
    {
        'variable': 'operating_name',
        'accuracy': 0.7156488549618321,
        'coverage': 0.16129032258064516,
    },
    {
        'variable': 'years_in_business',
        'accuracy': 0.8686224489795918,
        'coverage': 0.48590021691973967,
    },

]

new = [
    {
        'variable': 'years_in_business',
        'accuracy': 0.8564488549618321,
        'coverage': 0.4124206283185841,
    },
    {
        'variable': 'location',
        'accuracy': 0.6227561657767604,
        'coverage': 0.9267020523708422,
    },
    {
        'variable': 'operating_name',
        'accuracy': 0.8883720930232558,
        'coverage': 0.49710982658959535,
    },
       
]

DeepDiff will compare location vs. years_in_business, operating_name vs. location and years_in_business vs operating_name.

I have tried to use iterable_compare_func, to indicate how the variables should be compared, but it doesn't work as I would expect. What I want to do is compare an item from the old list with one from the new list if and only if the variable name is the same:

def compare(x, y, level):
    try:
        return x['variable'] == y['variable']
    except Exception:
        raise CannotCompare() from None

When I call DeepDiff with both parameters, it retruns {}.

DeepDiff(new, old, custom_operators=[CloseToOrGreatherThan(types=[dict])], iterable_compare_func=compare)

Calling DeepDiff with verbose_level=2, returns this, that is not the result I expect:

{'iterable_item_moved': {'root[0]': {'new_path': 'root[2]', 'value': {'variable': 'years_in_business', 'accuracy': 0.8686224489795918, 'coverage': 0.48590021691973967}}, 'root[1]': {'new_path': 'root[0]', 'value': {'variable': 'location', 'accuracy': 0.6338672768878718, 'coverage': 0.9278131634819533}}, 'root[2]': {'new_path': 'root[1]', 'value': {'variable': 'operating_name', 'accuracy': 0.7156488549618321, 'coverage': 0.16129032258064516}}}}

Do you have any idea what I'm doing wrong or how can I achieve what I need to do? I would like, if possible, to solve this with DeepDiff capabilities and not have to preprocess the lists before calling the function.

Upvotes: 1

Views: 1976

Answers (1)

Anton Radoilski
Anton Radoilski

Reputation: 1

Have you tried the 'ignore_order' parameter?

Upvotes: 0

Related Questions