mCs
mCs

Reputation: 2921

UnstructuredURLLoader not able to see libmagic

I tried to use UnstructuredURLLoader as below

from langchain.document_loaders import UnstructuredURLLoader

loaders = UnstructuredURLLoader(urls=urls)
data = loaders.load()

but some pages report that

libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://wellfound.com/company/chorus-one, exception: Invalid file. The FileType.UNK file type is not supported in partition.

while in my conda env I seem to have it

%pip list | grep libmagic
libmagic                      1.0

but I do not have the python-libmagic. When I try to install it:

pip install python-libmagic

I keep getting error:

Collecting python-libmagic
  Using cached python_libmagic-0.4.0-py3-none-any.whl
Collecting cffi==1.7.0 (from python-libmagic)
  Using cached cffi-1.7.0.tar.gz (400 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pycparser in /opt/conda/envs/cho_env/lib/python3.10/site-packages (from cffi==1.7.0->python-libmagic) (2.21)
Building wheels for collected packages: cffi
  Building wheel for cffi (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [254 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/ffiplatform.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/cffi_opcode.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/verifier.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/commontypes.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/vengine_gen.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/setuptools_ext.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/vengine_cpy.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/recompiler.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/cparser.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/lock.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/backend_ctypes.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/__init__.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/model.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/api.py -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/_cffi_include.h -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/parse_c_type.h -> build/lib.linux-x86_64-cpython-310/cffi
      copying cffi/_embedding.h -> build/lib.linux-x86_64-cpython-310/cffi
      running build_ext
      building '_cffi_backend' extension
      creating build/temp.linux-x86_64-cpython-310
      creating build/temp.linux-x86_64-cpython-310/c
      gcc -pthread -B /opt/conda/envs/cho_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -DUSE__THREAD -I/usr/include/ffi -I/usr/include/libffi -I/opt/conda/envs/cho_env/include/python3.10 -c c/_cffi_backend.c -o build/temp.linux-x86_64-cpython-310/c/_cffi_backend.o
      In file included from c/_cffi_backend.c:274:
      c/minibuffer.h: In function ‘mb_ass_slice’:
      c/minibuffer.h:66:5: warning: ‘PyObject_AsReadBuffer’ is deprecated [-Wdeprecated-declarations]
         66 |     if (PyObject_AsReadBuffer(other, &buffer, &buffer_len) < 0)
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/genobject.h:12,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:110,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/abstract.h:343:17: note: declared here
        343 | PyAPI_FUNC(int) PyObject_AsReadBuffer(PyObject *obj,
            |                 ^~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:277:
      c/file_emulator.h: In function ‘PyFile_AsFile’:
      c/file_emulator.h:54:14: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
         54 |         mode = PyText_AsUTF8(ob_mode);
            |              ^
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h: In function ‘_my_PyUnicode_AsSingleWideChar’:
      c/wchar_helper.h:83:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
         83 |     Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:84:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:84:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:84:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {
            |     ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h: In function ‘_my_PyUnicode_SizeAsWideChar’:
      c/wchar_helper.h:99:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:99:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h:99:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from c/_cffi_backend.c:281:
      c/wchar_helper.h: In function ‘_my_PyUnicode_AsWideChar’:
      c/wchar_helper.h:118:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
        118 |     Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);
            |     ^~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘ctypedescr_dealloc’:
      c/_cffi_backend.c:352:23: error: lvalue required as left operand of assignment
        352 |         Py_REFCNT(ct) = 43;
            |                       ^
      c/_cffi_backend.c:355:23: error: lvalue required as left operand of assignment
        355 |         Py_REFCNT(ct) = 0;
            |                       ^
      c/_cffi_backend.c: In function ‘cast_to_integer_or_char’:
      c/_cffi_backend.c:3331:26: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);
            |                          ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:3331:26: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);
            |                          ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
            |                                             ^~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:3331:26: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);
            |                          ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘b_complete_struct_or_union’:
      c/_cffi_backend.c:4251:17: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
       4251 |                 do_align = PyText_GetSize(fname) > 0;
            |                 ^~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
            |                                           ^~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:4283:13: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
       4283 |             if (PyText_GetSize(fname) == 0 &&
            |             ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
            |                                           ^~~~~~~~~~~~~~~~~
      c/_cffi_backend.c:4353:17: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
       4353 |                 if (PyText_GetSize(fname) > 0) {
            |                 ^~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
            |                                           ^~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘prepare_callback_info_tuple’:
      c/_cffi_backend.c:5214:5: warning: ‘PyEval_InitThreads’ is deprecated [-Wdeprecated-declarations]
       5214 |     PyEval_InitThreads();
            |     ^~~~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:130,
                       from c/_cffi_backend.c:2:
      /opt/conda/envs/cho_env/include/python3.10/ceval.h:122:37: note: declared here
        122 | Py_DEPRECATED(3.9) PyAPI_FUNC(void) PyEval_InitThreads(void);
            |                                     ^~~~~~~~~~~~~~~~~~
      c/_cffi_backend.c: In function ‘b_callback’:
      c/_cffi_backend.c:5255:5: warning: ‘ffi_prep_closure’ is deprecated: use ffi_prep_closure_loc instead [-Wdeprecated-declarations]
       5255 |     if (ffi_prep_closure(closure, &cif_descr->cif,
            |     ^~
      In file included from c/_cffi_backend.c:15:
      /opt/conda/envs/cho_env/include/ffi.h:347:1: note: declared here
        347 | ffi_prep_closure (ffi_closure*,
            | ^~~~~~~~~~~~~~~~
      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
                       from c/_cffi_backend.c:2:
      c/ffi_obj.c: In function ‘_ffi_type’:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
            |                             ^~~~~~~~~~~~~~~~
      c/_cffi_backend.c:72:25: note: in expansion of macro ‘_PyUnicode_AsString’
         72 | # define PyText_AS_UTF8 _PyUnicode_AsString
            |                         ^~~~~~~~~~~~~~~~~~~
      c/ffi_obj.c:191:32: note: in expansion of macro ‘PyText_AS_UTF8’
        191 |             char *input_text = PyText_AS_UTF8(arg);
            |                                ^~~~~~~~~~~~~~
      c/lib_obj.c: In function ‘lib_build_cpython_func’:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
            |                             ^~~~~~~~~~~~~~~~
      c/_cffi_backend.c:72:25: note: in expansion of macro ‘_PyUnicode_AsString’
         72 | # define PyText_AS_UTF8 _PyUnicode_AsString
            |                         ^~~~~~~~~~~~~~~~~~~
      c/lib_obj.c:129:21: note: in expansion of macro ‘PyText_AS_UTF8’
        129 |     char *libname = PyText_AS_UTF8(lib->l_libname);
            |                     ^~~~~~~~~~~~~~
      c/lib_obj.c: In function ‘lib_build_and_cache_attr’:
      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
            |                             ^~~~~~~~~~~~~~~~
      c/_cffi_backend.c:71:24: note: in expansion of macro ‘_PyUnicode_AsString’
         71 | # define PyText_AsUTF8 _PyUnicode_AsString   /* PyUnicode_AsUTF8 in Py3.3 */
            |                        ^~~~~~~~~~~~~~~~~~~
      c/lib_obj.c:208:15: note: in expansion of macro ‘PyText_AsUTF8’
        208 |     char *s = PyText_AsUTF8(name);
            |               ^~~~~~~~~~~~~
      In file included from c/cffi1_module.c:16,
                       from c/_cffi_backend.c:6636:
      c/lib_obj.c: In function ‘lib_getattr’:
      c/lib_obj.c:506:7: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        506 |     p = PyText_AsUTF8(name);
            |       ^
      In file included from c/cffi1_module.c:19,
                       from c/_cffi_backend.c:6636:
      c/call_python.c: In function ‘_get_interpstate_dict’:
      c/call_python.c:20:30: error: dereferencing pointer to incomplete type ‘PyInterpreterState’ {aka ‘struct _is’}
         20 |     builtins = tstate->interp->builtins;
            |                              ^~
      c/call_python.c: In function ‘_ffi_def_extern_decorator’:
      c/call_python.c:73:11: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
         73 |         s = PyText_AsUTF8(name);
            |           ^
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for cffi
  Running setup.py clean for cffi
Failed to build cffi
ERROR: Could not build wheels for cffi, which is required to install pyproject.toml-based projects```

How can I fix or bypass this?

Upvotes: 3

Views: 6807

Answers (5)

Koki
Koki

Reputation: 149

In my case, installing libmagic via Homebrew and python-magic via pip worked.

% brew install libmagic
% pip install python-magic

https://formulae.brew.sh/formula/libmagic
https://pypi.org/project/python-magic/

Upvotes: 0

Michael
Michael

Reputation: 1

This one is worked for me. There is library version support related issues for UnstructuredURLLoader in libmagic. You can use SeleniumURLLoader() instead of UnstructuredURLLoader(). For you above code you can modify your code like below:

from langchain.document_loaders import SeleniumURLLoader
loader = SeleniumURLLoader(urls=urls)
data = loader.load()
len(data)

Upvotes: 0

Akash Kobal
Akash Kobal

Reputation: 11

Change the kernel, execute. If any module error again change the kernel to the present one.

I solved the issue by changing the kernel.

Upvotes: 0

ioanaB
ioanaB

Reputation: 1

I could use UnstructuredURLLoader by first installing the libmagic dev package for ubuntu and then installing the python interface to libmagic:

apt-get update && apt-get install -y libmagic-dev
pip install python-magic

Then, I can import the library in Python by:

import magic

Upvotes: 0

Marc
Marc

Reputation: 2420

Got the same issue. Root cause: the python-magic library does not include required binary packages for windows, mac and linux. However, the python-magic-bin fork does include them.

Note that python-libmagic (which you have tried) would not work for me either. Go for python-magic-bin instead.

So, try the following solution (found in this GitHub issue page) which worked for me:

# uninstall what you initially tried, to avoid conflicts
pip uninstall python-libmagic
pip uninstall python-magic 

# install the working one
pip install python-magic-bin

If you are using conda (instead of PyPI), then you can use conda install -c conda-forge libmagic, as per this GH issue page.

Upvotes: 13

Related Questions