asd
asd

Reputation: 1117

Python: ImportError: lxml not found, please install it

I have the following code (in PyCharm (MacOS)):

import pandas as pd

fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')

print(fiddy_states)

And I get the following error:

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
  File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
    fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
    keep_default_na=keep_default_na)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
    parser = _parser_dispatch(flav)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).

I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.

Upvotes: 56

Views: 171877

Answers (16)

Farhad
Farhad

Reputation: 1

I have also faced the same problem. I solved it by Just installing lxml_html_clean using pip3 install lxml_html_clean

Upvotes: 0

user2756663
user2756663

Reputation: 31

for me

pip install --upgrade lxml_html_clean

worked

Upvotes: 3

osuya ikechukwu
osuya ikechukwu

Reputation: 1

When I attempted to install lxml using pip3, I encountered an error. However, all I needed to do was close and then reopen my coding environment.

Upvotes: 0

Yuriy
Yuriy

Reputation: 1

import pandas as pd
 from urllibenter code here.request import Request, urlopen

url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)

Upvotes: 0

Geo99M6Z
Geo99M6Z

Reputation: 43

I was seeing this issue as well on my RPi.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
    displayed_only=displayed_only,
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
    parser = _parser_dispatch(flav)
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module

>>> from lxml import etree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory

I searched for that error and found that the following packages needed to be installed on the RPi

sudo apt-get install libxslt

After installing I was successfully able to use pandas

Upvotes: 0

Gon&#231;alo Peres
Gon&#231;alo Peres

Reputation: 13582

As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run

conda install -c anaconda lxml

(Source)

One can also do it by specifying the version as follows

conda install -c anaconda lxml=4.8.0

Notes:

  • pip doesn't manage dependencies the same way conda does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda doesn't work.

    pip install lxml
    
    # or
    
    pip install lxml==4.9.1
    
  • If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows

    pip install -Iv lxml==4.9.1
    
  • lxml official documentation can be found here.

  • This is their official GitHub repo.

Upvotes: 0

Bear77777
Bear77777

Reputation: 1

I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!

pip install lxml==4.8

Upvotes: 0

H T
H T

Reputation: 11

I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.

Here is the function:

def _importers() -> None:
    # import things we need
    # but make this done on a first use basis

    global _IMPORTS
    if _IMPORTS:
        return

    global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
    bs4 = import_optional_dependency("bs4", errors="ignore")
    _HAS_BS4 = bs4 is not None

    lxml = import_optional_dependency("lxml.etree", errors="ignore")
    _HAS_LXML = lxml is not None

    html5lib = import_optional_dependency("html5lib", errors="ignore")
    _HAS_HTML5LIB = html5lib is not None

    _IMPORTS = True

You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.

Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().

Upvotes: 1

Aku
Aku

Reputation: 802

  1. you may have to (re)install some of your libraries pip install lxml bs4 html5lib

  2. pd.read_html() reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')

Upvotes: 5

ashusharma
ashusharma

Reputation: 1

This error occurs when lxml is not installed, so just go to the terminal and run: pip3 install lxml

Terminal Image

Upvotes: 0

yankeemike
yankeemike

Reputation: 451

I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :

conda update anaconda
conda install spyder=5.0.5

Now when I restarted Spyder and ran my code it worked fine.

I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.

Upvotes: 0

Heidar Jon
Heidar Jon

Reputation: 11

I tried to reinstall lxml without any progress.

I ended uninstalling pandas and reinstalling and updating and that solved my issues!

pip uninstall pandas  
pip install pandas
pip3 install --upgrade pandas

Upvotes: 1

EasonL
EasonL

Reputation: 853

For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.

Upvotes: 46

Krish PG
Krish PG

Reputation: 115

You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.

I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4

Upvotes: 2

Ruxi Zhang
Ruxi Zhang

Reputation: 323

I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.

Upvotes: 5

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476594

Based on the fact that the error is:

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6

This means that you are working with . Now usually the package manager for is pip3. So you probably should install it with:

pip3 install lxml

Upvotes: 59

Related Questions