Julian Baehr
Julian Baehr

Reputation: 27

How to parse this XML response in Python?

This is my XML file:

<?xml version="1.0" ?>
<Items>
    <Item>
        <ASIN>3570102769</ASIN>
        <DetailPageURL>http://www.amazon.de/Inside-IS-Tage-Islamischen-Staat/dp/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3570102769</DetailPageURL>
        <ItemLinks>
            <ItemLink>
                <Description>Add To Wishlist</Description>
                <URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3570102769%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>Tell A Friend</Description>
                <URL>http://www.amazon.de/gp/pdp/taf/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Customer Reviews</Description>
                <URL>http://www.amazon.de/review/product/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Offers</Description>
                <URL>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
            </ItemLink>
        </ItemLinks>
        <ItemAttributes>
            <Author>Jürgen Todenhöfer</Author>
            <Binding>Gebundene Ausgabe</Binding>
            <EAN>9783570102763</EAN>
            <EANList>
                <EANListElement>9783570102763</EANListElement>
            </EANList>
            <ISBN>3570102769</ISBN>
            <IsEligibleForTradeIn>1</IsEligibleForTradeIn>
            <ItemDimensions>
                <Height Units="hundredths-inches">874</Height>
                <Length Units="hundredths-inches">575</Length>
                <Width Units="hundredths-inches">126</Width>
            </ItemDimensions>
            <Label>C. Bertelsmann Verlag</Label>
            <Languages>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Published</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Original</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Unbekannt</Type>
                </Language>
            </Languages>
            <ListPrice>
                <Amount>1799</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 17,99</FormattedPrice>
            </ListPrice>
            <Manufacturer>C. Bertelsmann Verlag</Manufacturer>
            <ManufacturerMinimumAge Units="months">192</ManufacturerMinimumAge>
            <NumberOfPages>288</NumberOfPages>
            <PackageDimensions>
                <Height Units="hundredths-inches">118</Height>
                <Length Units="hundredths-inches">567</Length>
                <Weight Units="hundredths-pounds">93</Weight>
                <Width Units="hundredths-inches">252</Width>
            </PackageDimensions>
            <PackageQuantity>1</PackageQuantity>
            <ProductGroup>Book</ProductGroup>
            <ProductTypeName>ABIS_BOOK</ProductTypeName>
            <PublicationDate>2015-04-27</PublicationDate>
            <Publisher>C. Bertelsmann Verlag</Publisher>
            <Studio>C. Bertelsmann Verlag</Studio>
            <Title>Inside IS - 10 Tage im 'Islamischen Staat'</Title>
            <TradeInValue>
                <Amount>930</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 9,30</FormattedPrice>
            </TradeInValue>
        </ItemAttributes>
        <OfferSummary>
            <LowestNewPrice>
                <Amount>1799</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 17,99</FormattedPrice>
            </LowestNewPrice>
            <LowestUsedPrice>
                <Amount>1390</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 13,90</FormattedPrice>
            </LowestUsedPrice>
            <LowestCollectiblePrice>
                <Amount>4999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 49,99</FormattedPrice>
            </LowestCollectiblePrice>
            <TotalNew>56</TotalNew>
            <TotalUsed>8</TotalUsed>
            <TotalCollectible>1</TotalCollectible>
            <TotalRefurbished>0</TotalRefurbished>
        </OfferSummary>
        <Offers>
            <TotalOffers>1</TotalOffers>
            <TotalOfferPages>1</TotalOfferPages>
            <MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</MoreOffersUrl>
            <Offer>
                <OfferAttributes>
                    <Condition>New</Condition>
                </OfferAttributes>
                <OfferListing>
                    <OfferListingId>9KHCZj9qtL6ucVBPASfXaryQjU8tWbc0n%2F3F4F7GraOKW6Csji2OxpD93%2FkoHwgIGQctlnrtx4RWIeJULAcvvsFhiopFi08JdsZ%2FeO3u6g0%3D</OfferListingId>
                    <Price>
                        <Amount>1799</Amount>
                        <CurrencyCode>EUR</CurrencyCode>
                        <FormattedPrice>EUR 17,99</FormattedPrice>
                    </Price>
                    <Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
                    <AvailabilityAttributes>
                        <AvailabilityType>now</AvailabilityType>
                        <MinimumHours>0</MinimumHours>
                        <MaximumHours>0</MaximumHours>
                    </AvailabilityAttributes>
                    <IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
                </OfferListing>
            </Offer>
        </Offers>
    </Item>
    <Item>
        <ASIN>3813506479</ASIN>
        <DetailPageURL>http://www.amazon.de/Altes-Land-Roman-D%C3%B6rte-Hansen/dp/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3813506479</DetailPageURL>
        <ItemLinks>
            <ItemLink>
                <Description>Add To Wishlist</Description>
                <URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3813506479%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>Tell A Friend</Description>
                <URL>http://www.amazon.de/gp/pdp/taf/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Customer Reviews</Description>
                <URL>http://www.amazon.de/review/product/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
            <ItemLink>
                <Description>All Offers</Description>
                <URL>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
            </ItemLink>
        </ItemLinks>
        <ItemAttributes>
            <Author>Dörte Hansen</Author>
            <Binding>Gebundene Ausgabe</Binding>
            <EAN>9783813506471</EAN>
            <EANList>
                <EANListElement>9783813506471</EANListElement>
            </EANList>
            <ISBN>3813506479</ISBN>
            <IsEligibleForTradeIn>1</IsEligibleForTradeIn>
            <ItemDimensions>
                <Height Units="hundredths-inches">870</Height>
                <Length Units="hundredths-inches">567</Length>
                <Width Units="hundredths-inches">114</Width>
            </ItemDimensions>
            <Label>Albrecht Knaus Verlag</Label>
            <Languages>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Published</Type>
                </Language>
                <Language>
                    <Name>Deutsch</Name>
                    <Type>Original</Type>
                </Language>
            </Languages>
            <ListPrice>
                <Amount>1999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 19,99</FormattedPrice>
            </ListPrice>
            <Manufacturer>Albrecht Knaus Verlag</Manufacturer>
            <NumberOfPages>288</NumberOfPages>
            <PackageDimensions>
                <Height Units="hundredths-inches">118</Height>
                <Length Units="hundredths-inches">858</Length>
                <Weight Units="hundredths-pounds">101</Weight>
                <Width Units="hundredths-inches">559</Width>
            </PackageDimensions>
            <ProductGroup>Book</ProductGroup>
            <ProductTypeName>ABIS_BOOK</ProductTypeName>
            <PublicationDate>2015-02-16</PublicationDate>
            <Publisher>Albrecht Knaus Verlag</Publisher>
            <Studio>Albrecht Knaus Verlag</Studio>
            <Title>Altes Land: Roman</Title>
            <TradeInValue>
                <Amount>965</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 9,65</FormattedPrice>
            </TradeInValue>
        </ItemAttributes>
        <OfferSummary>
            <LowestNewPrice>
                <Amount>1999</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 19,99</FormattedPrice>
            </LowestNewPrice>
            <LowestUsedPrice>
                <Amount>1599</Amount>
                <CurrencyCode>EUR</CurrencyCode>
                <FormattedPrice>EUR 15,99</FormattedPrice>
            </LowestUsedPrice>
            <TotalNew>72</TotalNew>
            <TotalUsed>8</TotalUsed>
            <TotalCollectible>0</TotalCollectible>
            <TotalRefurbished>0</TotalRefurbished>
        </OfferSummary>
        <Offers>
            <TotalOffers>1</TotalOffers>
            <TotalOfferPages>1</TotalOfferPages>
            <MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</MoreOffersUrl>
            <Offer>
                <OfferAttributes>
                    <Condition>New</Condition>
                </OfferAttributes>
                <OfferListing>
                    <OfferListingId>aeRv5KPt26T8S0hLrgV8Bv9UPYABYOMijGRxffbNJXUZSN4XfeeOZZpCZ28EURzmgMLlcYEBSRlMXS%2F8Z0pN1JbYerndME%2B2VK3RosfdQJA%3D</OfferListingId>
                    <Price>
                        <Amount>1999</Amount>
                        <CurrencyCode>EUR</CurrencyCode>
                        <FormattedPrice>EUR 19,99</FormattedPrice>
                    </Price>
                    <Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
                    <AvailabilityAttributes>
                        <AvailabilityType>now</AvailabilityType>
                        <MinimumHours>0</MinimumHours>
                        <MaximumHours>0</MaximumHours>
                    </AvailabilityAttributes>
                    <IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
                </OfferListing>
            </Offer>
        </Offers>
    </Item>
</Items>

I want to get any ASIN element. So I tried this:

from lxml import etree
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
    asin = a.xpath('//ASIN/text()')
    print asin

What I get is this:

['3570102769', '3813506479']
['3570102769', '3813506479']

But I want this:

['3570102769']
['3813506479']

I don't understand what's the problem here? I think I should iterate over any element and in every element is one item with one asin. Why does it return two times two asin?

Upvotes: 2

Views: 104

Answers (1)

wonderb0lt
wonderb0lt

Reputation: 2053

When you're searching for a.xpath('//ASIN/text()') you're searching the complete document tree again. Quoting from the XML Path language specification:

//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

So what you're doing is iterating over the matched Item nodes and saying "Give me all ASIN nodes in this document please". The context for this (the Item node) is ignored.

What you should do instead, is directly select the ASIN child-node directly. Keeping to your original implementation this could look like this:

doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
    asin = a.xpath('ASIN/text()')
    print asin

which gives the output you desire:

['3570102769']
['3813506479']

Alternatively, if you're not certain where in the Item node your ASIN appears, you could use .//ASIN/text()

Upvotes: 2

Related Questions