ganjaam
ganjaam

Reputation: 1286

How to retrieve all values of same tag from a response in xml format using python?

I was using Lookup api of DBpedia which returned response in an xml format like the following:

<ArrayOfResults>
    <Result>
        <Label>China</Label>
        <URI>http://dbpedia.org/resource/China</URI>
        <Description>China .... administrative regions of Hong Kong and Macau.</Description>
        <Classes>
            <Class>
                <Label>Place</Label>
                <URI>http://dbpedia.org/ontology/Place</URI>
            </Class>
            <Class>
                <Label>Country</Label>
                <URI>http://dbpedia.org/ontology/Country</URI>
            </Class>
        </Classes>
        <Categories>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations</URI>
            </Category>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Republics</URI>
            </Category>
        </Categories>
        <Refcount>12789</Refcount>
    </Result>
    <Result>
        <Label>Theatre of China</Label>
        <URI>http://dbpedia.org/resource/Theatre_of_China</URI>
        <Description>Theatre of China ... the 20th century.</Description>
        <Classes/>
        <Categories>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Asian_drama</URI>
            </Category>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Chinese_performing_arts</URI>
            </Category>
        </Categories>
        <Refcount>23</Refcount>
    </Result>
</ArrayOfResults>

I have shortened it. The full response can be found in this link

Now, I need to retrieve all the values under the <Label> and <URI> tags.

Here's what I've done so far:

import requests
import xml.etree.ElementTree as ET

response = requests.get('https://lookup.dbpedia.org/api/search?query=China')
response_body = response.content

response_xml = ET.fromstring(response_body)

root = ET.fromstring(response_body)
for child in root:
    print(child.tag)
    for grandchild in child:
        print(f"\t {grandchild.tag}")
        label = grandchild.find('Label')
        uri = grandchild.find('URI')
        print(f"\t required label = {label}")
        print(f"\t required uri = {uri}")

But the value of label and uri is None in every case. How can I solve this issue so that I can get all the values (like China, Theatre of China etc) under <Label> tag of <Result> and the uri of <URI> tag under it?

Upvotes: 0

Views: 279

Answers (2)

Lev Levitsky
Lev Levitsky

Reputation: 65791

You're actually nesting too deep. You need to call find on child (which is a <Result> element):

for child in root:
    label = child.find('Label').text
    uri = child.find('URI').text

Upvotes: 1

Lukas Muijs
Lukas Muijs

Reputation: 121

Hi I don't know whether you need to know which urls are connected to what labels but this would be a very simple way to get all URLs out

import requests

url = 'https://lookup.dbpedia.org/api/search?query=China'

soup = BeautifulSoup(requests.get(url).text,'xml').find('Result')

labels = [label.text for label in soup.find_all('Label')]

URI= [uri.text for uri in soup.find_all('URI')]

Upvotes: 0

Related Questions