Reputation: 152795
Strangly, if I display a dictionary in IPython
it seems to recalculate the hashes of the keys. This behaviour doesn't happen in normal python interpreters and I would like to know what the reason for this could be.
An example:
class Fun(object):
def __init__(self, value):
self._value = value
def __hash__(self):
print('hashing')
return hash(self._value)
def __eq__(self, other):
if isinstance(other, Fun):
return self._value == other._value
else:
return self._value == other
def __repr__(self):
return '{}({})'.format(self.__class__.__name__, self._value)
when creating the dictionary the hash
are obviously needed:
In [2]: dict1 = {Fun(10): 5, Fun(11): 5}
hashing
hashing
But it surprised me when I displayed the dictionary later:
In [3]: dict1
Out[3]: hashing
hashing
{Fun(11): 5, Fun(10): 5}
This doesn't happen if I use the repr
or items
:
In [4]: dict1.items()
Out[4]: [(Fun(10), 5), (Fun(11), 5)]
In [5]: repr(dict1)
Out[5]: '{Fun(10): 5, Fun(11): 5}'
Normally I wouldn't care but I'm looking into some performance issues with a class that has a very expensive hash
method and it seems unreasonably to me why displaying dict1
(especially opposed to repr(dict1)
) should recalculate the hash
of the keys.
But the question isn't just about the why (even that's what is really interesting me), I would also be very interested in how to disable that. I'm using IPython 5.1.0.
Upvotes: 4
Views: 84
Reputation: 231665
I suspect it has something to do with putting the dictionary, or a copy, in the Out
dictionary. Other methods of displaying or referencing the dictionary don't do this
In [7]: d
Out[7]: hashing
hashing
{Fun(10): 5, Fun(11): 5}
In [8]: d;
In [9]: d
Out[9]: hashing
hashing
{Fun(10): 5, Fun(11): 5}
In [10]: d;
In [11]: print(d)
{Fun(10): 5, Fun(11): 5}
In [12]: str(d)
Out[12]: '{Fun(10): 5, Fun(11): 5}'
In [13]: repr(d)
Out[13]: '{Fun(10): 5, Fun(11): 5}'
In [21]: id(d)
Out[21]: 2977840716
In [22]: id(Out[7])
Out[22]: 2977840716
This may just a another way of looking the pretty print issue.
Deep copy does the rehashing, shallow does not:
In [28]: {k:v for k,v in d.items()};
hashing
hashing
In [29]: d1 = {}
In [30]: d1.update(d)
In [32]: import copy
In [33]: copy.copy(d);
In [34]: copy.deepcopy(d);
hashing
hashing
With a larger dictionary, e.g. db={Fun(i):i for i in range(15)}
, the Ipython display is multiline. Interesting though, pprint.pprint(db)
prints multiline without rehashing (but with a different key order).
Upvotes: 0
Reputation: 375942
Interesting. I added a pdb.set_trace() into the hashing function, and tried printing dict1. Once in pdb, I used the "where" command to see the stack:
In [16]: dict1
Out[16]: > <ipython-input-14-01f77f64262f>(6)__hash__()
-> print('hashing')
(Pdb) where
/usr/local/virtualenvs/lab/bin/ipython(11)<module>()
-> sys.exit(start_ipython())
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/__init__.py(119)start_ipython()
-> return launch_new_instance(argv=argv, **kwargs)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/traitlets/config/application.py(596)launch_instance()
-> app.start()
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/ipapp.py(344)start()
-> self.shell.mainloop()
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py(550)mainloop()
-> self.interact(display_banner=display_banner)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py(674)interact()
-> self.run_cell(source_raw, store_history=True)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2723)run_cell()
-> interactivity=interactivity, compiler=compiler, result=result)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2831)run_ast_nodes()
-> if self.run_code(code, result):
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2885)run_code()
-> exec(code_obj, self.user_global_ns, self.user_ns)
<ipython-input-16-8239e7494a4a>(1)<module>()
-> dict1
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/displayhook.py(246)__call__()
-> format_dict, md_dict = self.compute_format_data(result)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/displayhook.py(152)compute_format_data()
-> return self.shell.display_formatter.format(result)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(177)format()
-> data = formatter(obj)
<decorator-gen-10>(2)__call__()
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(222)catch_format_error()
-> r = method(self, *args, **kwargs)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(699)__call__()
-> printer.pretty(obj)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/lib/pretty.py(368)pretty()
-> return self.type_pprinters[cls](obj, self, cycle)
/usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/lib/pretty.py(623)inner()
-> p.pretty(obj[key])
> <ipython-input-14-01f77f64262f>(6)__hash__()
-> print('hashing')
Looks like the ipython shell is trying hard to pretty-print the result. The pretty.py code is:
for idx, key in p._enumerate(keys):
if idx:
p.text(',')
p.breakable()
p.pretty(key)
p.text(': ')
p.pretty(obj[key])
Looking up obj[key]
involves hashing the key again.
Can this be avoided? Not sure! ¯\_(ツ)_/¯
Upvotes: 4