Reputation: 3749
Python 3.5
See the code
import urllib.request
from xml.etree import ElementTree as ET
url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
def conectar(url):
page = urllib.request.urlopen(url)
return page.read()
root = ET.fromstring(conectar(url))
s = root.findall("//*[contains(.,'21/')]")
A need extract '21/'
, but return this error:
Erro:
Traceback (most recent call last):
File "crawler.py", line 11, in <module>
root = ET.fromstring(conectar(url))
File "/home/rg3915/.pyenv/versions/3.5.0/lib/python3.5/xml/etree/ElementTree.py", line 1321, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: unbound prefix: line 146, column 8
But I do not know how to solve this error.
Upvotes: 0
Views: 353
Reputation: 26
You could start with:
import urllib2
from bs4 import BeautifulSoup
url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
response = urllib2.urlopen(url)
html = response.read()
dom = BeautifulSoup(html, 'html.parser')
tables = dom.find_all("table")
if len(tables):
table = tables[0]
print table
(tested in python 2.7)
Upvotes: 1
Reputation: 9523
While the document you are trying to parse claims to be xhtml, it is invalid xml due to the unbound prefix.
<gcse:search></gcse:search>
The gcse
ns prefix is not defined for the document.
BeautifulSoup would probably be much better suited for what you are trying to do, because it is not fussy about the document being 100% valid.
Upvotes: 1