Reputation: 1286
I was using Lookup api of DBpedia which returned response in an xml format like the following:
<ArrayOfResults>
<Result>
<Label>China</Label>
<URI>http://dbpedia.org/resource/China</URI>
<Description>China .... administrative regions of Hong Kong and Macau.</Description>
<Classes>
<Class>
<Label>Place</Label>
<URI>http://dbpedia.org/ontology/Place</URI>
</Class>
<Class>
<Label>Country</Label>
<URI>http://dbpedia.org/ontology/Country</URI>
</Class>
</Classes>
<Categories>
<Category>
<URI>http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations</URI>
</Category>
<Category>
<URI>http://dbpedia.org/resource/Category:Republics</URI>
</Category>
</Categories>
<Refcount>12789</Refcount>
</Result>
<Result>
<Label>Theatre of China</Label>
<URI>http://dbpedia.org/resource/Theatre_of_China</URI>
<Description>Theatre of China ... the 20th century.</Description>
<Classes/>
<Categories>
<Category>
<URI>http://dbpedia.org/resource/Category:Asian_drama</URI>
</Category>
<Category>
<URI>http://dbpedia.org/resource/Category:Chinese_performing_arts</URI>
</Category>
</Categories>
<Refcount>23</Refcount>
</Result>
</ArrayOfResults>
I have shortened it. The full response can be found in this link
Now, I need to retrieve all the values under the <Label>
and <URI>
tags.
Here's what I've done so far:
import requests
import xml.etree.ElementTree as ET
response = requests.get('https://lookup.dbpedia.org/api/search?query=China')
response_body = response.content
response_xml = ET.fromstring(response_body)
root = ET.fromstring(response_body)
for child in root:
print(child.tag)
for grandchild in child:
print(f"\t {grandchild.tag}")
label = grandchild.find('Label')
uri = grandchild.find('URI')
print(f"\t required label = {label}")
print(f"\t required uri = {uri}")
But the value of label
and uri
is None in every case. How can I solve this issue so that I can get all the values (like China, Theatre of China etc) under <Label>
tag of <Result>
and the uri of <URI>
tag under it?
Upvotes: 0
Views: 279
Reputation: 65791
You're actually nesting too deep. You need to call find
on child
(which is a <Result>
element):
for child in root:
label = child.find('Label').text
uri = child.find('URI').text
Upvotes: 1
Reputation: 121
Hi I don't know whether you need to know which urls are connected to what labels but this would be a very simple way to get all URLs out
import requests
url = 'https://lookup.dbpedia.org/api/search?query=China'
soup = BeautifulSoup(requests.get(url).text,'xml').find('Result')
labels = [label.text for label in soup.find_all('Label')]
URI= [uri.text for uri in soup.find_all('URI')]
Upvotes: 0