Yaeli778
Yaeli778

Reputation: 215

Parsing XML attributes with nested namespaces with lxml

I want to parse the country codes attributes in RankByCountry. How can I do it?

means- print a list ['GB', 'US', 'O']

<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"><aws:OperationRequest><aws:RequestId>122bfdc6-ae8e-d2a2-580e-3841ab33b966</aws:RequestId></aws:OperationRequest><aws:UrlInfoResult><aws:Alexa>
 <aws:TrafficData>
  <aws:DataUrl type="canonical">androidjones.com/</aws:DataUrl>
  <aws:RankByCountry>
    <aws:Country Code="GB">
      <aws:Rank>80725</aws:Rank>
      <aws:Contribution>
        <aws:PageViews>30.6%</aws:PageViews>
        <aws:Users>41.3%</aws:Users>
      </aws:Contribution>
    </aws:Country>
    <aws:Country Code="US">
      <aws:Rank>354356</aws:Rank>
      <aws:Contribution>
        <aws:PageViews>39.1%</aws:PageViews>
        <aws:Users>28.9%</aws:Users>
      </aws:Contribution>
    </aws:Country>
    <aws:Country Code="O">
      <aws:Rank/>
      <aws:Contribution>
        <aws:PageViews>30.2%</aws:PageViews>
        <aws:Users>29.8%</aws:Users>
      </aws:Contribution>
    </aws:Country>
  </aws:RankByCountry>
 </aws:TrafficData>
</aws:Alexa></aws:UrlInfoResult><aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:StatusCode>Success</aws:StatusCode></aws:ResponseStatus></aws:Response></aws:UrlInfoResponse>

already tried--

namespaces = {"aws": "http://awis.amazonaws.com/doc/2005-07-11"}
RankByCountry = tree.xpath("//aws:Country/Code", namespaces=namespaces)

but no luck.

and also:

for country in tree.xpath('//Country'):
   for attrib in country.attrib:
      print '@' + attrib + '=' + country.attrib[attrib]

Upvotes: 1

Views: 627

Answers (1)

hek2mgl
hek2mgl

Reputation: 158230

The document looks weird, since it is using the aws namespace prefix twice. You need to use the more specific namespace, since this overwrites the global namespace with prefix aws. Actually you are doing this right.

The problem is the xpath expression itself, it should look like this:

for country in tree.xpath('//aws:RankByCountry/aws:Country/@Code', namespaces=namespaces):
    print(country) 

Note that <aws:RankByCountry> has no Code attribute, but <aws:Country> has.

Upvotes: 3

Related Questions