EMD
EMD

Reputation: 93

Convert CKAN data API call from bytes into Pandas DataFrame

I am trying to access data.gov.au datasets through their CKAN data API. Unfortunately, the data API instructions are slightly outdated and do not seem to work. Instructions found here.

So far, I've worked out that I am meant to query the dataset using urllib.request.

import urllib.request

req = urllib.request.Request('https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de')

with urllib.request.urlopen(req) as response:
   data = response.read()  

This produces an object of type bytes that looks like a dictionary data structure, where the dataset seems to be stored in "records:".

enter image description here

I'm wondering how I can convert the data records into a Pandas DataFrame. I've tried converting the bytes object into a string and reading that as a json file, but the output is wrong.

# code that did not work
result = str(data, 'utf-8')
rdata = StringIO(result)
df = pd.read_json(rdata)
df

Wrong output

The output I would like to return looks like this:

enter image description here

Thanks!

Upvotes: 0

Views: 384

Answers (1)

Here is a solution that works:

import numpy as np
import pandas as pd
import requests
import json

url = "https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de"
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
print(content)
df = pd.read_json(content)
df.to_csv("output.csv")
df = pd.json_normalize(df['result']['records'])

You actually were near the solution. It is only the last step df=pd.json_normalize(df['result']['records']) you were missing.

enter image description here

Upvotes: 2

Related Questions