Reputation: 1117
I have the following code (in PyCharm (MacOS)):
import pandas as pd
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fiddy_states)
And I get the following error:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
keep_default_na=keep_default_na)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
parser = _parser_dispatch(flav)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
In Anaconda does appear installed the last version of lxml
(3.8.0
). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml
and 2) downloading the lxml
wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl
), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform
, even though the version of python is correct (3.6, 64 bits)).
I've read similar questions here (even with the same code above, since it's from a tutorial), but the problem still persists.
Upvotes: 56
Views: 171877
Reputation: 1
I have also faced the same problem. I solved it by Just installing lxml_html_clean using pip3 install lxml_html_clean
Upvotes: 0
Reputation: 1
When I attempted to install lxml
using pip3
, I encountered an error. However, all I needed to do was close and then reopen my coding environment.
Upvotes: 0
Reputation: 1
import pandas as pd
from urllibenter code here.request import Request, urlopen
url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)
Upvotes: 0
Reputation: 43
I was seeing this issue as well on my RPi.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
displayed_only=displayed_only,
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
parser = _parser_dispatch(flav)
File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module
>>> from lxml import etree
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory
I searched for that error and found that the following packages needed to be installed on the RPi
sudo apt-get install libxslt
After installing I was successfully able to use pandas
Upvotes: 0
Reputation: 13582
As OP is using Anaconda, in order to solve that issue, install lxml
by opening the CMD.Exe Prompt
for the environment one is working on, and run
conda install -c anaconda lxml
(Source)
One can also do it by specifying the version as follows
conda install -c anaconda lxml=4.8.0
Notes:
pip
doesn't manage dependencies the same way conda
does and can, potentially, damage one's installation. Therefore, would recommend to use it only if conda
doesn't work.
pip install lxml
# or
pip install lxml==4.9.1
If one is using pip and one has already the package installed and one is getting errors, one can pass -I
(--ignore-installed
) and -v
as follows
pip install -Iv lxml==4.9.1
lxml
official documentation can be found here.
Upvotes: 0
Reputation: 1
I installed lxml 4.9.1, but it didn't work. So I tried to install lxml 4.8.0 instead, and it worked!
pip install lxml==4.8
Upvotes: 0
Reputation: 11
I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~\Miniconda3\envs\mini_ds\lib\site-packages\pandas\io\html.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.
Here is the function:
def _importers() -> None:
# import things we need
# but make this done on a first use basis
global _IMPORTS
if _IMPORTS:
return
global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
bs4 = import_optional_dependency("bs4", errors="ignore")
_HAS_BS4 = bs4 is not None
lxml = import_optional_dependency("lxml.etree", errors="ignore")
_HAS_LXML = lxml is not None
html5lib = import_optional_dependency("html5lib", errors="ignore")
_HAS_HTML5LIB = html5lib is not None
_IMPORTS = True
You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.
Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor ='html5lib' in pd.read_html().
Upvotes: 1
Reputation: 802
you may have to (re)install some of your libraries pip install lxml bs4 html5lib
pd.read_html()
reads with 'lxml' library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')
Upvotes: 5
Reputation: 1
This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml
Upvotes: 0
Reputation: 451
I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :
conda update anaconda
conda install spyder=5.0.5
Now when I restarted Spyder and ran my code it worked fine.
I have just installed and starting using anaconda so I don't know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.
Upvotes: 0
Reputation: 11
I tried to reinstall lxml
without any progress.
I ended uninstalling pandas and reinstalling and updating and that solved my issues!
pip uninstall pandas
pip install pandas
pip3 install --upgrade pandas
Upvotes: 1
Reputation: 853
For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml
and the error is gone.
Upvotes: 46
Reputation: 115
You can go to Settings > Project Interpreter > Click on '+' icon
Find 'lxml' from the list of packages and click 'Install Package' button found below.
I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4
Upvotes: 2
Reputation: 323
I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.
Upvotes: 5
Reputation: 476594
Based on the fact that the error is:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
This means that you are working with python-3.6. Now usually the package manager for python-3.x is pip3
. So you probably should install it with:
pip3 install lxml
Upvotes: 59