Reputation: 688
I am wondering what the big-O runtime complexity is for comparing two collections.Counter
objects. Here is some code to demonstrate what I mean:
import collections
counter_1 = collections.Counter("abcabcabcabcabcabcdefg")
counter_2 = collections.Counter("xyzxyzxyzabc")
comp = counter_1 == counter_2 # What is the runtime of this comparison statement?
Is the runtime of the equality comparison in the final statement O(1)? Or is it O(num_of_unique_keys_in_largest_counter)? Or is it something else?
For reference, here is the source code for collections.Counter
https://github.com/python/cpython/blob/0250de48199552cdaed5a4fe44b3f9cdb5325363/Lib/collections/init.py#L497
I do not see the class implementing an __eq()__
method.
Bonus points: If the answer to this question changes between python2 and python3, I would love to hear the difference?
Upvotes: 2
Views: 3913
Reputation: 36702
Counter is a subclass of dict
, therefore the big O analysis is the one of dict, with the caveat that Counter
objects are specialized to only hold int
values (i/e they can not hold collections of values as dicts can); this simplifies the analysis.
Looking at the c code implementation of the equality comparison:
O(1)
; this operation is repeated at most n
times (n
being the number of keys)In all, the time complexity is O(n)
, with n
the number of keys.
This applies to both python 2 and 3.
from dictobject.c
/* Return 1 if dicts equal, 0 if not, -1 if error.
* Gets out as soon as any difference is detected.
* Uses only Py_EQ comparison.
*/
static int
dict_equal(PyDictObject *a, PyDictObject *b)
{
Py_ssize_t i;
if (a->ma_used != b->ma_used)
/* can't be equal if # of entries differ */
return 0;
/* Same # of entries -- check all of 'em. Exit early on any diff. */
for (i = 0; i < a->ma_keys->dk_nentries; i++) {
PyDictKeyEntry *ep = &DK_ENTRIES(a->ma_keys)[i];
PyObject *aval;
if (a->ma_values)
aval = a->ma_values[i];
else
aval = ep->me_value;
if (aval != NULL) {
int cmp;
PyObject *bval;
PyObject *key = ep->me_key;
/* temporarily bump aval's refcount to ensure it stays
alive until we're done with it */
Py_INCREF(aval);
/* ditto for key */
Py_INCREF(key);
/* reuse the known hash value */
b->ma_keys->dk_lookup(b, key, ep->me_hash, &bval);
if (bval == NULL) {
Py_DECREF(key);
Py_DECREF(aval);
if (PyErr_Occurred())
return -1;
return 0;
}
cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);
Py_DECREF(key);
Py_DECREF(aval);
if (cmp <= 0) /* error or not equal */
return cmp;
}
}
return 1;
}
Upvotes: 8
Reputation: 25809
Internally, collections.Counter
stores the count as a dictionary (that's why it subclasses dict
) so the same rules apply as with comparing dictionaries - namely, it compares each key with each value from both dictionaries to ensure equality. For CPython, that is implemented in dict_equal()
, other implementations may vary but, logically, you have to do the each-with-each comparison to ensure equality.
This also means that the complexity is O(N)
at its worst (loops through one of the dictionaries, looks if the value is the same in the other). There are no significant changes between Python 2.x and Python 3.x in this regard.
Upvotes: 1