leeprevost
leeprevost

Reputation: 444

Parsing a json string with floats as strings in python

I am writing a class object that uses python requests to read json from a URL. The json string is strange as the floats, dates, and integers are strings.

{
"financialStatementList" : [ {
"symbol" : "HUBS",
"financials" : [ {
  "date" : "2018-12-31",
  "Revenue" : "512980000.0",
  "Revenue Growth" : "0.3657",
  "Cost of Revenue" : "100357000.0",
  "Gross Profit" : "412623000.0",
  "R&D Expenses" : "117603000.0",
  "SG&A Expense" : "343278000.0",
  "Operating Expenses" : "460881000.0",
  "Operating Income" : "-48258000.0",
  "Interest Expense" : "21386000.0",
  "Earnings before Tax" : "-61960000.0",
  "Income Tax Expense" : "1868000.0",
  "Net Income - Non-Controlling int" : "0.0",
  "Net Income - Discontinued ops" : "0.0",
  "Net Income" : "-63828000.0",
  "Preferred Dividends" : "0.0",
  "Net Income Com" : "-63828000.0",
  "EPS" : "-1.66",
  "EPS Diluted" : "-1.66",
  "Weighted Average Shs Out" : "39232269.0",
  "Weighted Average Shs Out (Dil)" : "38529000.0",
  "Dividend per Share" : "0.0",
  "Gross Margin" : "0.8044",
  "EBITDA Margin" : "-0.033",
  "EBIT Margin" : "-0.0791",
  "Profit Margin" : "-0.124",
  "Free Cash Flow margin" : "0.1002",
  "EBITDA" : "-17146000.0",
  "EBIT" : "-40574000.0",
  "Consolidated Income" : "-63828000.0",
  "Earnings Before Tax Margin" : "-0.1208",
  "Net Profit Margin" : "-0.1244"
}

An example of the json at the api endpoint is here: financialmodelingprep.com

My problem is that when I decode this, I end up with objects/strings rather than floats or integers.

I've tried:

r = requests.get(url, params)
jd = json.loads(r.text)

as well as:

r = requests.get(url, params)
jd - r.json()

And, also variations using kwargs such as parse_float = float or parse_float=Decimal

My end goal is to get this into format with floats, int, and dates.

Upvotes: 0

Views: 2506

Answers (1)

leeprevost
leeprevost

Reputation: 444

I ended up needing to write a custom object hook for my json decoder.

I also decided to add a camelizer to shorten the keys.

import requests
import re
import json
from datetime import datetime

quarterdateformat = '%Y-%m-%d'


def camelize(string):
    return "".join(string.split(" "))

def convert_types(d):
    for k, v in d.items():
        #print(k, type(v))
        new_v = v
        if type(v) is str:
            #match for float
            if re.match('[-+]?[0-9]*\.[0-9]+', v):  
                new_v = float(v)

            #match for date
            if re.match('([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))', v):  
                new_v = datetime.strptime(v, quarterdateformat).date()


        d[k] = new_v
    d = {camelize(k): v for k, v in d.items()}
    return d

url = "https://financialmodelingprep.com/api/v3/financials/income-statement/CRM,HUBS"
params = {'datatyupe' : 'json'}
r = requests.get(url, params)
jd= json.loads(r.text, object_hook=convert_types)

convert_types is the object-hook function which uses regex to look for floats and dates and converts them. The camelizer is used at the end of the object hook to convert all keys to CamelCase.

Upvotes: 1

Related Questions