Reputation: 1
I have an absolute path for the values of XML files I want to retrieve. The absolute path is in the format of "A/B/C". How can I do this in Python?
Upvotes: 0
Views: 656
Reputation: 23825
Using ElementTree library (Note that my answer uses core python library while the other answers are using external libraries.)
import xml.etree.ElementTree as ET
xml = '''<ROOT><A><B><C>The Value</C></B></A></ROOT>'''
root = ET.fromstring(xml)
print(root.find('./A/B/C').text)
output
The Value
Upvotes: 3
Reputation: 2469
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
# Basic
xml = '''<ROOT><A><B><C>The Value</C></B></A></ROOT>'''
doc = SimplifiedDoc(xml)
print (doc.select('A>B>C'))
# Multiple
xml = '''<ROOT><A><B><C>The Value 1</C></B></A><A><B><C>The Value 2</C></B></A></ROOT>'''
doc = SimplifiedDoc(xml)
# print (doc.selects('A').select('B').select('C'))
print (doc.selects('A').select('B>C'))
# Mixed structure
xml = '''<ROOT><A><other>no B</other></A><A><other></other><B>no C</B></A><A><B><C>The Value</C></B></A></ROOT>'''
doc = SimplifiedDoc(xml)
nodes = doc.selects('A').selects('B').select('C')
for node in nodes:
for c in node:
if c:
print (c)
Result:
{'tag': 'C', 'html': 'The Value'}
[{'tag': 'C', 'html': 'The Value 1'}, {'tag': 'C', 'html': 'The Value 2'}]
{'tag': 'C', 'html': 'The Value'}
Upvotes: 2
Reputation:
You can use lxml which you can install via pip install lxml
.
See also https://lxml.de/xpathxslt.html
from io import StringIO
from lxml import etree
data = '''\
<prestashop>
<combination>
<id>a</id>
<id_product>b</id_product>
<location>c</location>
<ean13>d</ean13>
<isbn>e</isbn>
<upc>f</upc>
<mpn>g</mpn>
</combination>
</prestashop>
'''
xpath = '/prestashop/combination/ean13'
f = StringIO(data)
tree = etree.parse(f)
matches = tree.xpath(xpath)
for e in matches:
print(e.text)
Upvotes: 1