MSeifert
MSeifert

Reputation: 152795

Displaying a dictionary in IPython recalculates the hashes

Strangly, if I display a dictionary in IPython it seems to recalculate the hashes of the keys. This behaviour doesn't happen in normal python interpreters and I would like to know what the reason for this could be.

An example:

class Fun(object):
    def __init__(self, value):
        self._value = value

    def __hash__(self):
        print('hashing')
        return hash(self._value)

    def __eq__(self, other):
        if isinstance(other, Fun):
            return self._value == other._value
        else:
            return self._value == other

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self._value)

when creating the dictionary the hash are obviously needed:

In [2]: dict1 = {Fun(10): 5, Fun(11): 5}
hashing
hashing

But it surprised me when I displayed the dictionary later:

In [3]: dict1
Out[3]: hashing
hashing
{Fun(11): 5, Fun(10): 5}

This doesn't happen if I use the repr or items:

In [4]: dict1.items()
Out[4]: [(Fun(10), 5), (Fun(11), 5)]

In [5]: repr(dict1)
Out[5]: '{Fun(10): 5, Fun(11): 5}'

Normally I wouldn't care but I'm looking into some performance issues with a class that has a very expensive hash method and it seems unreasonably to me why displaying dict1 (especially opposed to repr(dict1)) should recalculate the hash of the keys.

But the question isn't just about the why (even that's what is really interesting me), I would also be very interested in how to disable that. I'm using IPython 5.1.0.

Upvotes: 4

Views: 84

Answers (2)

hpaulj
hpaulj

Reputation: 231665

I suspect it has something to do with putting the dictionary, or a copy, in the Out dictionary. Other methods of displaying or referencing the dictionary don't do this

In [7]: d
Out[7]: hashing
hashing
{Fun(10): 5, Fun(11): 5}
In [8]: d;
In [9]: d
Out[9]: hashing
hashing
{Fun(10): 5, Fun(11): 5}
In [10]: d;
In [11]: print(d)
{Fun(10): 5, Fun(11): 5}
In [12]: str(d)
Out[12]: '{Fun(10): 5, Fun(11): 5}'
In [13]: repr(d)
Out[13]: '{Fun(10): 5, Fun(11): 5}'

In [21]: id(d)
Out[21]: 2977840716
In [22]: id(Out[7])
Out[22]: 2977840716

This may just a another way of looking the pretty print issue.

Deep copy does the rehashing, shallow does not:

In [28]: {k:v for k,v in d.items()};
hashing
hashing
In [29]: d1 = {}
In [30]: d1.update(d)
In [32]: import copy
In [33]: copy.copy(d);
In [34]: copy.deepcopy(d);
hashing
hashing

With a larger dictionary, e.g. db={Fun(i):i for i in range(15)}, the Ipython display is multiline. Interesting though, pprint.pprint(db) prints multiline without rehashing (but with a different key order).

Upvotes: 0

Ned Batchelder
Ned Batchelder

Reputation: 375942

Interesting. I added a pdb.set_trace() into the hashing function, and tried printing dict1. Once in pdb, I used the "where" command to see the stack:

In [16]: dict1
Out[16]: > <ipython-input-14-01f77f64262f>(6)__hash__()
-> print('hashing')
(Pdb) where
  /usr/local/virtualenvs/lab/bin/ipython(11)<module>()
-> sys.exit(start_ipython())
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/__init__.py(119)start_ipython()
-> return launch_new_instance(argv=argv, **kwargs)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/traitlets/config/application.py(596)launch_instance()
-> app.start()
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/ipapp.py(344)start()
-> self.shell.mainloop()
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py(550)mainloop()
-> self.interact(display_banner=display_banner)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/terminal/interactiveshell.py(674)interact()
-> self.run_cell(source_raw, store_history=True)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2723)run_cell()
-> interactivity=interactivity, compiler=compiler, result=result)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2831)run_ast_nodes()
-> if self.run_code(code, result):
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/interactiveshell.py(2885)run_code()
-> exec(code_obj, self.user_global_ns, self.user_ns)
  <ipython-input-16-8239e7494a4a>(1)<module>()
-> dict1
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/displayhook.py(246)__call__()
-> format_dict, md_dict = self.compute_format_data(result)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/displayhook.py(152)compute_format_data()
-> return self.shell.display_formatter.format(result)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(177)format()
-> data = formatter(obj)
  <decorator-gen-10>(2)__call__()
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(222)catch_format_error()
-> r = method(self, *args, **kwargs)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/core/formatters.py(699)__call__()
-> printer.pretty(obj)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/lib/pretty.py(368)pretty()
-> return self.type_pprinters[cls](obj, self, cycle)
  /usr/local/virtualenvs/lab/lib/python2.7/site-packages/IPython/lib/pretty.py(623)inner()
-> p.pretty(obj[key])
> <ipython-input-14-01f77f64262f>(6)__hash__()
-> print('hashing')

Looks like the ipython shell is trying hard to pretty-print the result. The pretty.py code is:

for idx, key in p._enumerate(keys):
    if idx:
        p.text(',')
        p.breakable()
    p.pretty(key)
    p.text(': ')
    p.pretty(obj[key])

Looking up obj[key] involves hashing the key again.

Can this be avoided? Not sure! ¯\_(ツ)_/¯

Upvotes: 4

Related Questions