Skylight
Skylight

Reputation: 221

Python-3.x SIMPLE XPath Library

I am trying to parse quite simple XML using Python.

Before Python 3 I used "webscraping" library with XPath functionality. Works very simple:

xpath.search(xml (xml string), "XPath Query (//search)"

- returns found elements based on provided XPath query.

Now I have decided to switch to Python 3, and library mentioned above doesn't work properly with it (even after 2to3.py) - so I decided to use native xml.etree.ElementTree library.

Probably I don't understand something, but this is proper nightmare. It doesn't work in the way where you simple provide XML and XPath query into the function and it returns results. Instead, you need to use 10+ lines of code, messing with element's children etc. and it still doesn't work...

import xml.etree.ElementTree as ET
doc = ET.fromstring(xml)
result = doc.findall("//XPath Query")

returns SyntaxError: cannot use absolute path on element Adding . to //XPath Query doesn't help a lot either.

Is there some kind of reason why ElementTree and lxml libraries are so complicated and don't allow to SIMPLE use XPATH instead of messing around with elements, using for loop every time etc?

Could anyone recommend simple library for python 3 which will just use XPath Query and return result?

Upvotes: 3

Views: 3597

Answers (2)

Skylight
Skylight

Reputation: 221

Found the problem now.

My XML response contains the following:

<?xml version="1.0" encoding="utf-8"?>
<GetOrdersResponse xmlns="urn:ebay:apis:eBLBaseComponents">
  <!-- Call-specific Output Fields -->
  <HasMoreOrders> boolean </HasMoreOrders>
  <OrderArray> OrderArrayType
    <Order> OrderType
      <AdjustmentAmount currencyID="CurrencyCodeType"> AmountType (double) </AdjustmentAmount>
      <AmountPaid currencyID="CurrencyCodeType"> AmountType (double) </AmountPaid>
      <AmountSaved currencyID="CurrencyCodeType"> AmountType (double) </AmountSaved>
      <BuyerCheckoutMessage> string </BuyerCheckoutMessage>
      <BuyerUserID> UserIDType (string) </BuyerUserID>
      <CheckoutStatus> CheckoutStatusType
      ...

After parsing that XML:

root = ET.fromstring(xml)
result = tree.findall("*")

It returns EVERY single element with prefix {urn:ebay:apis:eBLBaseComponents}

For example, if I need to search for <BuyerCheckoutMessage>

result = tree.findall(".//BuyerCheckoutMessage") it will return nothing, because that element looks like {urn:ebay:apis:eBLBaseComponents}BuyerCheckoutMessage.

Therefore, to search for elements, I need to include {urn:ebay:apis:eBLBaseComponents} before every XPath query in order to retrieve my element.

So the solution is to use :

result = tree.findall(".//{urn:ebay:apis:eBLBaseComponents}BuyerCheckoutMessage") result[0].text will return the elements value.

Why it just doesn't work the way of ET.search(xml, "XPath-query") is the biggest secret for me. So much time wasted.

Upvotes: 2

Fredrik H&#229;&#229;rd
Fredrik H&#229;&#229;rd

Reputation: 2925

Using the example xml from http://docs.python.org/2/library/xml.etree.elementtree.html, searching seems to work fine:

>>> import xml.etree.ElementTree as ET
>>> xml = """..."""
>>> doc = ET.fromstring(xml)
>>> doc.findall(".//rank")
[<Element 'rank' at 0x10199ebd0>, <Element 'rank' at 0x10199e210>, <Element 'rank' at 0x10199e4d0>]

Or if you want to search from root explicitly:

>>> ET.ElementTree(doc).findall('//rank')

Upvotes: 2

Related Questions