Reputation: 2921
I tried to use UnstructuredURLLoader
as below
from langchain.document_loaders import UnstructuredURLLoader
loaders = UnstructuredURLLoader(urls=urls)
data = loaders.load()
but some pages report that
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://wellfound.com/company/chorus-one, exception: Invalid file. The FileType.UNK file type is not supported in partition.
while in my conda env I seem to have it
%pip list | grep libmagic
libmagic 1.0
but I do not have the python-libmagic
. When I try to install it:
pip install python-libmagic
I keep getting error:
Collecting python-libmagic
Using cached python_libmagic-0.4.0-py3-none-any.whl
Collecting cffi==1.7.0 (from python-libmagic)
Using cached cffi-1.7.0.tar.gz (400 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: pycparser in /opt/conda/envs/cho_env/lib/python3.10/site-packages (from cffi==1.7.0->python-libmagic) (2.21)
Building wheels for collected packages: cffi
Building wheel for cffi (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [254 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/cffi
copying cffi/ffiplatform.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/cffi_opcode.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/verifier.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/commontypes.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/vengine_gen.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/setuptools_ext.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/vengine_cpy.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/recompiler.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/cparser.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/lock.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/backend_ctypes.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/__init__.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/model.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/api.py -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/_cffi_include.h -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/parse_c_type.h -> build/lib.linux-x86_64-cpython-310/cffi
copying cffi/_embedding.h -> build/lib.linux-x86_64-cpython-310/cffi
running build_ext
building '_cffi_backend' extension
creating build/temp.linux-x86_64-cpython-310
creating build/temp.linux-x86_64-cpython-310/c
gcc -pthread -B /opt/conda/envs/cho_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -DUSE__THREAD -I/usr/include/ffi -I/usr/include/libffi -I/opt/conda/envs/cho_env/include/python3.10 -c c/_cffi_backend.c -o build/temp.linux-x86_64-cpython-310/c/_cffi_backend.o
In file included from c/_cffi_backend.c:274:
c/minibuffer.h: In function ‘mb_ass_slice’:
c/minibuffer.h:66:5: warning: ‘PyObject_AsReadBuffer’ is deprecated [-Wdeprecated-declarations]
66 | if (PyObject_AsReadBuffer(other, &buffer, &buffer_len) < 0)
| ^~
In file included from /opt/conda/envs/cho_env/include/python3.10/genobject.h:12,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:110,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/abstract.h:343:17: note: declared here
343 | PyAPI_FUNC(int) PyObject_AsReadBuffer(PyObject *obj,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:277:
c/file_emulator.h: In function ‘PyFile_AsFile’:
c/file_emulator.h:54:14: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
54 | mode = PyText_AsUTF8(ob_mode);
| ^
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h: In function ‘_my_PyUnicode_AsSingleWideChar’:
c/wchar_helper.h:83:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
83 | Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);
| ^~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
| ^~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h:84:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
84 | if (PyUnicode_GET_SIZE(unicode) == 1) {
| ^~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h:84:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
84 | if (PyUnicode_GET_SIZE(unicode) == 1) {
| ^~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
| ^~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h:84:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
84 | if (PyUnicode_GET_SIZE(unicode) == 1) {
| ^~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h: In function ‘_my_PyUnicode_SizeAsWideChar’:
c/wchar_helper.h:99:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
99 | Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
| ^~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h:99:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
99 | Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
| ^~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
| ^~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h:99:5: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
99 | Py_ssize_t length = PyUnicode_GET_SIZE(unicode);
| ^~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from c/_cffi_backend.c:281:
c/wchar_helper.h: In function ‘_my_PyUnicode_AsWideChar’:
c/wchar_helper.h:118:5: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
118 | Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);
| ^~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
| ^~~~~~~~~~~~~~~~~~~
c/_cffi_backend.c: In function ‘ctypedescr_dealloc’:
c/_cffi_backend.c:352:23: error: lvalue required as left operand of assignment
352 | Py_REFCNT(ct) = 43;
| ^
c/_cffi_backend.c:355:23: error: lvalue required as left operand of assignment
355 | Py_REFCNT(ct) = 0;
| ^
c/_cffi_backend.c: In function ‘cast_to_integer_or_char’:
c/_cffi_backend.c:3331:26: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
3331 | PyUnicode_GET_SIZE(ob), ct->ct_name);
| ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
c/_cffi_backend.c:3331:26: warning: ‘PyUnicode_AsUnicode’ is deprecated [-Wdeprecated-declarations]
3331 | PyUnicode_GET_SIZE(ob), ct->ct_name);
| ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here
580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(
| ^~~~~~~~~~~~~~~~~~~
c/_cffi_backend.c:3331:26: warning: ‘_PyUnicode_get_wstr_length’ is deprecated [-Wdeprecated-declarations]
3331 | PyUnicode_GET_SIZE(ob), ct->ct_name);
| ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
c/_cffi_backend.c: In function ‘b_complete_struct_or_union’:
c/_cffi_backend.c:4251:17: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
4251 | do_align = PyText_GetSize(fname) > 0;
| ^~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
| ^~~~~~~~~~~~~~~~~
c/_cffi_backend.c:4283:13: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
4283 | if (PyText_GetSize(fname) == 0 &&
| ^~
In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
| ^~~~~~~~~~~~~~~~~
c/_cffi_backend.c:4353:17: warning: ‘PyUnicode_GetSize’ is deprecated [-Wdeprecated-declarations]
4353 | if (PyText_GetSize(fname) > 0) {
| ^~
In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here
177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
| ^~~~~~~~~~~~~~~~~
c/_cffi_backend.c: In function ‘prepare_callback_info_tuple’:
c/_cffi_backend.c:5214:5: warning: ‘PyEval_InitThreads’ is deprecated [-Wdeprecated-declarations]
5214 | PyEval_InitThreads();
| ^~~~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:130,
from c/_cffi_backend.c:2:
/opt/conda/envs/cho_env/include/python3.10/ceval.h:122:37: note: declared here
122 | Py_DEPRECATED(3.9) PyAPI_FUNC(void) PyEval_InitThreads(void);
| ^~~~~~~~~~~~~~~~~~
c/_cffi_backend.c: In function ‘b_callback’:
c/_cffi_backend.c:5255:5: warning: ‘ffi_prep_closure’ is deprecated: use ffi_prep_closure_loc instead [-Wdeprecated-declarations]
5255 | if (ffi_prep_closure(closure, &cif_descr->cif,
| ^~
In file included from c/_cffi_backend.c:15:
/opt/conda/envs/cho_env/include/ffi.h:347:1: note: declared here
347 | ffi_prep_closure (ffi_closure*,
| ^~~~~~~~~~~~~~~~
In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,
from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,
from c/_cffi_backend.c:2:
c/ffi_obj.c: In function ‘_ffi_type’:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
| ^~~~~~~~~~~~~~~~
c/_cffi_backend.c:72:25: note: in expansion of macro ‘_PyUnicode_AsString’
72 | # define PyText_AS_UTF8 _PyUnicode_AsString
| ^~~~~~~~~~~~~~~~~~~
c/ffi_obj.c:191:32: note: in expansion of macro ‘PyText_AS_UTF8’
191 | char *input_text = PyText_AS_UTF8(arg);
| ^~~~~~~~~~~~~~
c/lib_obj.c: In function ‘lib_build_cpython_func’:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
| ^~~~~~~~~~~~~~~~
c/_cffi_backend.c:72:25: note: in expansion of macro ‘_PyUnicode_AsString’
72 | # define PyText_AS_UTF8 _PyUnicode_AsString
| ^~~~~~~~~~~~~~~~~~~
c/lib_obj.c:129:21: note: in expansion of macro ‘PyText_AS_UTF8’
129 | char *libname = PyText_AS_UTF8(lib->l_libname);
| ^~~~~~~~~~~~~~
c/lib_obj.c: In function ‘lib_build_and_cache_attr’:
/opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
744 | #define _PyUnicode_AsString PyUnicode_AsUTF8
| ^~~~~~~~~~~~~~~~
c/_cffi_backend.c:71:24: note: in expansion of macro ‘_PyUnicode_AsString’
71 | # define PyText_AsUTF8 _PyUnicode_AsString /* PyUnicode_AsUTF8 in Py3.3 */
| ^~~~~~~~~~~~~~~~~~~
c/lib_obj.c:208:15: note: in expansion of macro ‘PyText_AsUTF8’
208 | char *s = PyText_AsUTF8(name);
| ^~~~~~~~~~~~~
In file included from c/cffi1_module.c:16,
from c/_cffi_backend.c:6636:
c/lib_obj.c: In function ‘lib_getattr’:
c/lib_obj.c:506:7: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
506 | p = PyText_AsUTF8(name);
| ^
In file included from c/cffi1_module.c:19,
from c/_cffi_backend.c:6636:
c/call_python.c: In function ‘_get_interpstate_dict’:
c/call_python.c:20:30: error: dereferencing pointer to incomplete type ‘PyInterpreterState’ {aka ‘struct _is’}
20 | builtins = tstate->interp->builtins;
| ^~
c/call_python.c: In function ‘_ffi_def_extern_decorator’:
c/call_python.c:73:11: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
73 | s = PyText_AsUTF8(name);
| ^
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for cffi
Running setup.py clean for cffi
Failed to build cffi
ERROR: Could not build wheels for cffi, which is required to install pyproject.toml-based projects```
How can I fix or bypass this?
Upvotes: 3
Views: 6807
Reputation: 149
In my case, installing libmagic via Homebrew and python-magic via pip worked.
% brew install libmagic
% pip install python-magic
https://formulae.brew.sh/formula/libmagic
https://pypi.org/project/python-magic/
Upvotes: 0
Reputation: 1
This one is worked for me. There is library version support related issues for UnstructuredURLLoader in libmagic. You can use SeleniumURLLoader() instead of UnstructuredURLLoader(). For you above code you can modify your code like below:
from langchain.document_loaders import SeleniumURLLoader
loader = SeleniumURLLoader(urls=urls)
data = loader.load()
len(data)
Upvotes: 0
Reputation: 11
Change the kernel, execute. If any module error again change the kernel to the present one.
I solved the issue by changing the kernel.
Upvotes: 0
Reputation: 1
I could use UnstructuredURLLoader by first installing the libmagic dev package for ubuntu and then installing the python interface to libmagic:
apt-get update && apt-get install -y libmagic-dev
pip install python-magic
Then, I can import the library in Python by:
import magic
Upvotes: 0
Reputation: 2420
Got the same issue. Root cause: the python-magic
library does not include required binary packages for windows, mac and linux. However, the python-magic-bin
fork does include them.
Note that python-libmagic
(which you have tried) would not work for me either. Go for python-magic-bin
instead.
So, try the following solution (found in this GitHub issue page) which worked for me:
# uninstall what you initially tried, to avoid conflicts
pip uninstall python-libmagic
pip uninstall python-magic
# install the working one
pip install python-magic-bin
If you are using conda
(instead of PyPI
), then you can use conda install -c conda-forge libmagic
, as per this GH issue page.
Upvotes: 13