Ryan
Ryan

Reputation: 688

Python Runtime of collections.Counter Equality

I am wondering what the big-O runtime complexity is for comparing two collections.Counter objects. Here is some code to demonstrate what I mean:

import collections
counter_1 = collections.Counter("abcabcabcabcabcabcdefg")
counter_2 = collections.Counter("xyzxyzxyzabc")
comp = counter_1 == counter_2  # What is the runtime of this comparison statement?

Is the runtime of the equality comparison in the final statement O(1)? Or is it O(num_of_unique_keys_in_largest_counter)? Or is it something else?

For reference, here is the source code for collections.Counter https://github.com/python/cpython/blob/0250de48199552cdaed5a4fe44b3f9cdb5325363/Lib/collections/init.py#L497

I do not see the class implementing an __eq()__ method.

Bonus points: If the answer to this question changes between python2 and python3, I would love to hear the difference?

Upvotes: 2

Views: 3913

Answers (2)

Reblochon Masque
Reblochon Masque

Reputation: 36702

Counter is a subclass of dict, therefore the big O analysis is the one of dict, with the caveat that Counter objects are specialized to only hold int values (i/e they can not hold collections of values as dicts can); this simplifies the analysis.

Looking at the c code implementation of the equality comparison:

  • There is an early exit if the number of keys is different. (this does not influence big-O).
  • Then a loop that iterates over all the keys that exits early if the key is not found, or if the corresponding value is different. (again, this has no bearing on big-O).
  • if all keys are found, and the corresponding values are all equal, then the dictionaries are declared equal. The lookup and comparisons of each key-value pair is O(1); this operation is repeated at most n times (n being the number of keys)

In all, the time complexity is O(n), with n the number of keys.
This applies to both python 2 and 3.


from dictobject.c

/* Return 1 if dicts equal, 0 if not, -1 if error.
 * Gets out as soon as any difference is detected.
 * Uses only Py_EQ comparison.
 */
static int
dict_equal(PyDictObject *a, PyDictObject *b)
{
    Py_ssize_t i;

    if (a->ma_used != b->ma_used)
        /* can't be equal if # of entries differ */
        return 0;
    /* Same # of entries -- check all of 'em.  Exit early on any diff. */
    for (i = 0; i < a->ma_keys->dk_nentries; i++) {
        PyDictKeyEntry *ep = &DK_ENTRIES(a->ma_keys)[i];
        PyObject *aval;
        if (a->ma_values)
            aval = a->ma_values[i];
        else
            aval = ep->me_value;
        if (aval != NULL) {
            int cmp;
            PyObject *bval;
            PyObject *key = ep->me_key;
            /* temporarily bump aval's refcount to ensure it stays
               alive until we're done with it */
            Py_INCREF(aval);
            /* ditto for key */
            Py_INCREF(key);
            /* reuse the known hash value */
            b->ma_keys->dk_lookup(b, key, ep->me_hash, &bval);
            if (bval == NULL) {
                Py_DECREF(key);
                Py_DECREF(aval);
                if (PyErr_Occurred())
                    return -1;
                return 0;
            }
            cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);
            Py_DECREF(key);
            Py_DECREF(aval);
            if (cmp <= 0)  /* error or not equal */
                return cmp;
        }
    }
    return 1;
}

Upvotes: 8

zwer
zwer

Reputation: 25809

Internally, collections.Counter stores the count as a dictionary (that's why it subclasses dict) so the same rules apply as with comparing dictionaries - namely, it compares each key with each value from both dictionaries to ensure equality. For CPython, that is implemented in dict_equal(), other implementations may vary but, logically, you have to do the each-with-each comparison to ensure equality.

This also means that the complexity is O(N) at its worst (loops through one of the dictionaries, looks if the value is the same in the other). There are no significant changes between Python 2.x and Python 3.x in this regard.

Upvotes: 1

Related Questions