Convert CKAN data API call from bytes into Pandas DataFrame

Question

I am trying to access data.gov.au datasets through their CKAN data API. Unfortunately, the data API instructions are slightly outdated and do not seem to work. Instructions found here.

So far, I've worked out that I am meant to query the dataset using urllib.request.

import urllib.request

req = urllib.request.Request('https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de')

with urllib.request.urlopen(req) as response:
   data = response.read()

This produces an object of type bytes that looks like a dictionary data structure, where the dataset seems to be stored in "records:".

I'm wondering how I can convert the data records into a Pandas DataFrame. I've tried converting the bytes object into a string and reading that as a json file, but the output is wrong.

# code that did not work
result = str(data, 'utf-8')
rdata = StringIO(result)
df = pd.read_json(rdata)
df

The output I would like to return looks like this:

Thanks!

Serge de Gosson de Varennes · Accepted Answer

Here is a solution that works:

import numpy as np
import pandas as pd
import requests
import json

url = "https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de"
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
print(content)
df = pd.read_json(content)
df.to_csv("output.csv")
df = pd.json_normalize(df['result']['records'])

You actually were near the solution. It is only the last step df=pd.json_normalize(df['result']['records']) you were missing.

Convert CKAN data API call from bytes into Pandas DataFrame

Answers (1)

Related Questions