Delicious
Delicious

Reputation: 967

Python-Levenshtein Distance Error "Assertion failed!"

I'm using Python 3.4.1 from Anaconda '3.4.1 |Anaconda 2.1.0 (64-bit)| (default, Sep 24 2014, 18:32:42) [MSC v.1600 64 bit (AMD64)]' Windows 8 machine.

I installed the package with pip install python-Levenshtein all smooth no errors. (version 0.12.0)

But then I tried to use it:

import Levenshtein as lvn
print(lvn.ratio('a', 'A'))

# This application has requested the Runtime to terminate it in an unusual way.
# Please contact the application's support team for more information.
# Assertion failed!
# 
# Program: C:\Users\Me\Anaconda-P3.x-64b\python.exe
# File: Levenshtein/_levenshtein.c, Line 726
# 
# Expression: PyUnicode_Check(arg1)
# 
# Process finished with exit code 3

I tried print(lvn.ratio.__doc__) and it prints the docs just fine. What am I missing to make this work? Is this an issue with the specific version of Python I have, compiler bug, or a bug in the Levenshtein library

The Levenshtein module source code is available at Github; the crashing line 726 being:

722 else if (PyObject_TypeCheck(arg1, &PyUnicode_Type)
723     && PyObject_TypeCheck(arg2, &PyUnicode_Type)) {
724   Py_UNICODE *string1, *string2;
725
726   len1 = PyUnicode_GET_SIZE(arg1);   // <-- assertion failure here
727   len2 = PyUnicode_GET_SIZE(arg2);
728   *lensum = len1 + len2;
729   string1 = PyUnicode_AS_UNICODE(arg1);
730   string2 = PyUnicode_AS_UNICODE(arg2);


I was able to reproduce this error on another Windows 8 machine with a fresh install of Python 3.4 (from Anaconda package - same package from my post) Testing some other methods: distance() - Exact same error. hamming() - Same error but on line 805. jaro() - Same error but on line 848.

I tried to install python-Levenshtein on Python 2.7 this time and I did get some warnings...

C:\Users\Thomaz\Anaconda32b\Scripts>pip.exe install python-Levenshtein
Collecting python-Levenshtein
  Using cached python-Levenshtein-0.12.0.tar.gz
Requirement already satisfied (use --upgrade to upgrade): setuptools in c:\users
\thomaz\anaconda32b\lib\site-packages\setuptools-5.8-py2.7.egg (from python-Leve
nshtein)
Installing collected packages: python-Levenshtein
  Running setup.py install for python-Levenshtein
    building 'Levenshtein._levenshtein' extension
    C:\MinGW\bin\gcc.exe -mdll -O -Wall -IC:\Users\Thomaz\Anaconda32b\include -I
C:\Users\Thomaz\Anaconda32b\PC -c Levenshtein/_levenshtein.c -o build\temp.win32
-2.7\Release\levenshtein\_levenshtein.o
    Levenshtein/_levenshtein.c: In function 'levenshtein_common':
    Levenshtein/_levenshtein.c:711:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:712:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c: In function 'hamming_py':
    Levenshtein/_levenshtein.c:796:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:797:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c: In function 'jaro_py':
    Levenshtein/_levenshtein.c:840:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:841:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c: In function 'jaro_winkler_py':
    Levenshtein/_levenshtein.c:890:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:891:13: warning: pointer targets in assignment di
ffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c: In function 'median_common':
    Levenshtein/_levenshtein.c:992:7: warning: pointer targets in passing argume
nt 1 of 'PyString_FromStringAndSize' differ in signedness [-Wpointer-sign]
           result = PyString_FromStringAndSize(medstr, len);
           ^
    In file included from C:\Users\Thomaz\Anaconda32b\include/Python.h:94:0,
                     from Levenshtein/_levenshtein.c:99:
    C:\Users\Thomaz\Anaconda32b\include/stringobject.h:62:24: note: expected 'co
nst char *' but argument is of type 'lev_byte *'
     PyAPI_FUNC(PyObject *) PyString_FromStringAndSize(const char *, Py_ssize_t)
;
                            ^
    Levenshtein/_levenshtein.c: In function 'median_improve_common':
    C:\Users\Thomaz\Anaconda32b\include/stringobject.h:91:32: warning: pointer t
argets in initialization differ in signedness [-Wpointer-sign]
     #define PyString_AS_STRING(op) (((PyStringObject *)(op))->ob_sval)
                                    ^
    Levenshtein/_levenshtein.c:1071:19: note: in expansion of macro 'PyString_AS
_STRING'
         lev_byte *s = PyString_AS_STRING(arg1);
                       ^
    Levenshtein/_levenshtein.c:1077:7: warning: pointer targets in passing argum
ent 1 of 'PyString_FromStringAndSize' differ in signedness [-Wpointer-sign]
           result = PyString_FromStringAndSize(medstr, len);
           ^
    In file included from C:\Users\Thomaz\Anaconda32b\include/Python.h:94:0,
                     from Levenshtein/_levenshtein.c:99:
    C:\Users\Thomaz\Anaconda32b\include/stringobject.h:62:24: note: expected 'co
nst char *' but argument is of type 'lev_byte *'
     PyAPI_FUNC(PyObject *) PyString_FromStringAndSize(const char *, Py_ssize_t)
;
                            ^
    Levenshtein/_levenshtein.c: In function 'extract_stringlist':
    Levenshtein/_levenshtein.c:1201:16: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         strings[0] = PyString_AS_STRING(first);
                    ^
    Levenshtein/_levenshtein.c:1213:18: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
           strings[i] = PyString_AS_STRING(item);
                      ^
    Levenshtein/_levenshtein.c: In function 'editops_py':
    Levenshtein/_levenshtein.c:1650:13: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:1651:13: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c: In function 'opcodes_py':
    Levenshtein/_levenshtein.c:1768:13: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:1769:13: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c: In function 'apply_edit_py':
    Levenshtein/_levenshtein.c:1863:13: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         string1 = PyString_AS_STRING(arg1);
                 ^
    Levenshtein/_levenshtein.c:1864:13: warning: pointer targets in assignment d
iffer in signedness [-Wpointer-sign]
         string2 = PyString_AS_STRING(arg2);
                 ^
    Levenshtein/_levenshtein.c:1878:7: warning: pointer targets in passing argum
ent 1 of 'PyString_FromStringAndSize' differ in signedness [-Wpointer-sign]
           result = PyString_FromStringAndSize(s, len);
           ^
    In file included from C:\Users\Thomaz\Anaconda32b\include/Python.h:94:0,
                     from Levenshtein/_levenshtein.c:99:
    C:\Users\Thomaz\Anaconda32b\include/stringobject.h:62:24: note: expected 'co
nst char *' but argument is of type 'lev_byte *'
     PyAPI_FUNC(PyObject *) PyString_FromStringAndSize(const char *, Py_ssize_t)
;
                            ^
    Levenshtein/_levenshtein.c:1894:7: warning: pointer targets in passing argum
ent 1 of 'PyString_FromStringAndSize' differ in signedness [-Wpointer-sign]
           result = PyString_FromStringAndSize(s, len);
           ^
    In file included from C:\Users\Thomaz\Anaconda32b\include/Python.h:94:0,
                     from Levenshtein/_levenshtein.c:99:
    C:\Users\Thomaz\Anaconda32b\include/stringobject.h:62:24: note: expected 'co
nst char *' but argument is of type 'lev_byte *'
     PyAPI_FUNC(PyObject *) PyString_FromStringAndSize(const char *, Py_ssize_t)
;
                            ^
    Levenshtein/_levenshtein.c: At top level:
    Levenshtein/_levenshtein.c:6630:1: warning: 'lev_editops_total_cost' defined
 but not used [-Wunused-function]
     lev_editops_total_cost(size_t n,
     ^
    Levenshtein/_levenshtein.c:6700:1: warning: 'lev_opcodes_total_cost' defined
 but not used [-Wunused-function]
     lev_opcodes_total_cost(size_t nb,
     ^
    Levenshtein/_levenshtein.c:6655:1: warning: 'lev_editops_normalize' defined
but not used [-Wunused-function]
     lev_editops_normalize(size_t n,
     ^
    Levenshtein/_levenshtein.c:2371:1: warning: 'lev_edit_distance_sod' defined
but not used [-Wunused-function]
     lev_edit_distance_sod(size_t len, const lev_byte *string,
     ^
    Levenshtein/_levenshtein.c:2550:1: warning: 'lev_u_edit_distance_sod' define
d but not used [-Wunused-function]
     lev_u_edit_distance_sod(size_t len, const lev_wchar *string,
     ^
    C:\MinGW\bin\gcc.exe -shared -s build\temp.win32-2.7\Release\levenshtein\_le
venshtein.o build\temp.win32-2.7\Release\levenshtein\_levenshtein.def -LC:\Users
\Thomaz\Anaconda32b\libs -LC:\Users\Thomaz\Anaconda32b\PCbuild -lpython27 -lmsvc
r90 -o build\lib.win32-2.7\Levenshtein\_levenshtein.pyd
Successfully installed python-Levenshtein-0.12.0

Interesting... In Python 2.7, even with the warnings above, everything works.



I uninstalled the package then re-installed the one from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-levenshtein and now I can use python-Levenshtein just fine in python 3.4

Upvotes: 2

Views: 813

Answers (1)

This answer is just a partial one, I do not have Windows myself so I cannot debug the bug. However just looking at the source code leads to the conclusion that "this bug is impossibru" - possibly a compiler bug, or really nasty case of UB, or something.

There could be something wrong with how your extension gets compiled. Before the line 726, there are checks on lines 722 and 723 that explicitly check that the arguments are Unicode objects or subclasses thereof (if they are not, then the entire if gets skipped and line 726 does not get run);

722 else if (PyObject_TypeCheck(arg1, &PyUnicode_Type)
723     && PyObject_TypeCheck(arg2, &PyUnicode_Type)) {
724   Py_UNICODE *string1, *string2;
725
726   len1 = PyUnicode_GET_SIZE(arg1);
727   len2 = PyUnicode_GET_SIZE(arg2);
728   *lensum = len1 + len2;
729   string1 = PyUnicode_AS_UNICODE(arg1);
730   string2 = PyUnicode_AS_UNICODE(arg2);

PyUnicode_GET_SIZE has 2 assert statements, and both of them get inlined to line 726:

#define PyUnicode_GET_SIZE(op)                       \
    (assert(PyUnicode_Check(op)),                    \
     (((PyASCIIObject *)(op))->wstr) ?               \
      PyUnicode_WSTR_LENGTH(op) :                    \
      ((void)PyUnicode_AsUnicode((PyObject *)(op)),  \
       assert(((PyASCIIObject *)(op))->wstr),        \
       PyUnicode_WSTR_LENGTH(op)))

The first one asserts that the object indeed is an Unicode object or subclass thereof by checking the return value of PyUnicode_Check:

 #define PyUnicode_Check(op) \
             PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_UNICODE_SUBCLASS)

The second one asserts that the PyUnicode_AsUnicode(op) sets the wstr member properly. The PyUnicode_Check(arg1) in your error output points to the fact that the first one fails here.

Now the funny part is that it shouldn't be possible to fail; the PyObject_TypeCheck succeeded, and it is a Unicode object (Python 3 str) so the assertion should also be a no-op.

Upvotes: 1

Related Questions