Ben
Ben

Reputation: 59

python simhash doesn't work on ubuntu

I have the same setup and code on mac for running simhash, it works.

But when I run it on Ubuntu, it complaints the implementation of simhash itself has the bug.

Have you encountered such problem?

objs = [(str(k), Simhash(v)) for k, v in index_data.items()] File "/usr/local/lib/python2.7/dist-packages/simhash-1.1.2-py2.7.egg/simhash/init.py", line 30, in init self.build_by_text(unicode(value)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 34: ordinal not in range(128)

Upvotes: 0

Views: 319

Answers (1)

Daniel
Daniel

Reputation: 411

The error tells you, that str(k) can't be correctly decoded. Since I don't know where the data is coming from and what it actually is, I can just say that something like

str(k).decode('cp850')

or

Simhash(v.decode('cp850'))

might help. Assuming the string is in cp850. At least I can do a '\xf6'.decode('cp850').

And since that seems to be a problem within the module, check, that the string that is used is properly decoded beforehand.

Upvotes: 0

Related Questions