Reputation: 83
I want to get some data from https://kartkatalog.geonorge.no/api/search?limit=10000&text=&facets[0]name=type&facets[0]value=software&mediatype=xml
What I need is the "title" and "GetCapabilitiesUrl" for every record. I have tried playing around with BeautifulSoup, but I can't find the right way to get the data I want.
Does someone know how to proceed with this?
Thanks.
Upvotes: 0
Views: 565
Reputation: 633
That link you posted looks like a JSON file, not an XML file. You can see the difference here. You can use the json
module in python to parse this data.
Once you get a string with the data from the website, you can use json.loads()
to convert a string containing a JSON object into a python object.
The following code snippet will put all titles in a variable called titles
and a urls in urls
import json
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
raw_json_string = urllib.request.urlopen("https://kartkatalog.geonorge.no/api/search?limit=10000&text=&facets%5B0%5Dname=type&facets%5B0%5Dvalue=software&mediatype=xml").read()
json_object = json.loads(raw_json_string)
titles = []
urls = []
for record in json_object["Results"]:
titles.append(record["Title"])
try:
urls.append(record["GetCapabilitiesUrl"])
except:
pass
When writing the code, you can use an online JSON viewer to help you figure out the elements of dictionaries and lists.
Upvotes: 1