Reputation: 93
I am trying to access data.gov.au datasets through their CKAN data API. Unfortunately, the data API instructions are slightly outdated and do not seem to work. Instructions found here.
So far, I've worked out that I am meant to query the dataset using urllib.request
.
import urllib.request
req = urllib.request.Request('https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de')
with urllib.request.urlopen(req) as response:
data = response.read()
This produces an object of type bytes that looks like a dictionary data structure, where the dataset seems to be stored in "records:"
.
I'm wondering how I can convert the data records into a Pandas DataFrame. I've tried converting the bytes object into a string and reading that as a json file, but the output is wrong.
# code that did not work
result = str(data, 'utf-8')
rdata = StringIO(result)
df = pd.read_json(rdata)
df
The output I would like to return looks like this:
Thanks!
Upvotes: 0
Views: 384
Reputation: 11504
Here is a solution that works:
import numpy as np
import pandas as pd
import requests
import json
url = "https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de"
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
print(content)
df = pd.read_json(content)
df.to_csv("output.csv")
df = pd.json_normalize(df['result']['records'])
You actually were near the solution. It is only the last step df=pd.json_normalize(df['result']['records'])
you were missing.
Upvotes: 2