Derek Lee
Derek Lee

Reputation: 13

Is there a way to normalize a json pulled straight from an api

Here is the type of json file that I am working with

{
  "header": {
    "gtfsRealtimeVersion": "1.0",
    "incrementality": "FULL_DATASET",
    "timestamp": "1656447045"
  },
  "entity": [
    {
      "id": "RTVP:T:16763243",
      "isDeleted": false,
      "vehicle": {
        "trip": {
          "tripId": "16763243",
          "scheduleRelationship": "SCHEDULED"
        },
        "position": {
          "latitude": 33.497833,
          "longitude": -112.07365,
          "bearing": 0.0,
          "odometer": 16512.0,
          "speed": 1.78816
        },
        "currentStopSequence": 16,
        "currentStatus": "INCOMING_AT",
        "timestamp": "1656447033",
        "stopId": "2792",
        "vehicle": {
          "id": "5074"
        }
      }
    },
    {
      "id": "RTVP:T:16763242",
      "isDeleted": false,
      "vehicle": {
        "trip": {
          "tripId": "16763242",
          "scheduleRelationship": "SCHEDULED"
        },
        "position": {
          "latitude": 33.562374,
          "longitude": -112.07392,
          "bearing": 359.0,
          "odometer": 40367.0,
          "speed": 15.6464
        },
        "currentStopSequence": 36,
        "currentStatus": "INCOMING_AT",
        "timestamp": "1656447024",
        "stopId": "2794",
        "vehicle": {
          "id": "5251"
        }
      }
    }
  ]
}

In my code, I am taking in the json as a string. But when I try normalize json string to put into data frame

import pandas as pd
import json
import requests


base_URL = requests.get('https://app.mecatran.com/utw/ws/gtfsfeed/vehicles/valleymetro?apiKey=4f22263f69671d7f49726c3011333e527368211f&asJson=true')
packages_json = base_URL.json()
packages_str = json.dumps(packages_json, indent=1)

df = pd.json_normalize(packages_str)

I get this error, I am definitely making some rookie error, but how exactly am I using this wrong? Are there additional arguments that may need in that?

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-33-aa23f9157eac> in <module>()
      8 packages_str = json.dumps(packages_json, indent=1)
      9 
---> 10 df = pd.json_normalize(packages_str)

/usr/local/lib/python3.7/dist-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    421         data = list(data)
    422     else:
--> 423         raise NotImplementedError
    424 
    425     # check to see if a simple recursive function is possible to

NotImplementedError: 

When I had the json format within my code without the header portion referenced as an object, the pd.json_normalize(package_str) does work, why would that be, and what additional things would I need to do?

Upvotes: 0

Views: 750

Answers (2)

MangoNrFive
MangoNrFive

Reputation: 1599

The issue is, that pandas.json_normalize expects either a dictionary or a list of dictionaries but json.dumps returns a string.

It should work if you skip the json.dumps and directly input the json to the normalizer, like this:

import pandas as pd
import json
import requests


base_URL = requests.get('https://app.mecatran.com/utw/ws/gtfsfeed/vehicles/valleymetro?apiKey=4f22263f69671d7f49726c3011333e527368211f&asJson=true')
packages_json = base_URL.json()

df = pd.json_normalize(packages_json) 

If you take a look at the corresponding source-code of pandas you can see for yourself:

if isinstance(data, list) and not data:
    return DataFrame()
elif isinstance(data, dict):
    # A bit of a hackjob
    data = [data]
elif isinstance(data, abc.Iterable) and not isinstance(data, str):
    # GH35923 Fix pd.json_normalize to not skip the first element of a
    # generator input
    data = list(data)
else:
    raise NotImplementedError

You should find this code at the path that is shown in the stacktrace, with the error raised on line 423: /usr/local/lib/python3.7/dist-packages/pandas/io/json/_normalize.py

I would advise you to use a code-linter or an IDE that has one included (like PyCharm for example) as this is the type of error that doesn't happen if you have one.

Upvotes: 1

Jakub
Jakub

Reputation: 184

I m not sure where is the problem, but if you are desperate, you can always make text function that will data-mine that Json.

Yes, it will be quite tiring, but with +-10 variables you need to mine, for each row, you will be done in +-60 minutes no problem.

Something like this:

def MineJson(text, target): #target is for example "id"
    #print(text)
    findword = text.find(target)
    g=findword+len(target)+5 #should not include the first "
    new_text= text[g:] #output should be starting with RTVP:T...
    return new_text

def WhatsAfter(text): #should return new text and RTVP:T:16763243
    #print(text)
    toFind='"'
    findEnd = text.find(toFind)
    g=findEnd
    value=text[:g]
    new_text= text[g:]
    return new_text,value

I wrote it without testing, so maybe there will be some mistakes.

Upvotes: 0

Related Questions