Looping json data dynamically in Python

Question

I have a two json file in a server. The first json file is a dataframe in json format where it is having 21 columns.

The second json object is a collection of different filters to be applied on the first json(data file) & I want to calculate the reduction in amount column dynamically after applying each filter.

Both the jsons are in server. Sample of this is like below,

[{
        "criteria_no.": 1,
        "expression": "!=",
        "attributes": "Industry_name",
        "value": "Clasentrix"

    },{ 
        "criteria_no.": 2,
        "expression": "=",
        "attributes": "currency",
        "value": ["EUR","GBP","INR"]


    },{
        "criteria_no.": 3,
        "expression": ">",
        "attributes": "Industry_Rating",
        "value": "A3"

    },{
        "criteria_no.": 4,
        "expression": "<",
        "attributes": "Due_date",
        "value": "01/01/2025"

    }
    ]

When coded in python, it is like below,

import urllib2, json
url = urllib2.urlopen('http://.../server/criteria_sample.json')
obj = json.load(url)
print obj

[{u'attributes': u'Industry_name', u'expression': u'!=', u'value': u'Clasentrix', u'criteria_no.': 1}, {u'attributes': u'currency', u'expression': u'=', u'value': [u'EUR', u'GBP', u'INR'], u'criteria_no.': 2}, {u'attributes': u'Industry_Rating', u'expression': u'>', u'value': u'A3', u'criteria_no.': 3}, {u'attributes': u'Due_date', u'expression': u'<', u'value': u'01/01/2025', u'criteria_no.': 4}]

Now, in the sample json, we can see "attributes", which are nothing but the columns present in the first data file. I mentioned it has 21 columns, "Industry_name","currency","Industry_Rating","Due_date" are four of them. "Loan_amount" is another column present there in the data file along with all them.

Now as this criteria list is only a sample, we are having n number of such criterion or filters. I want this filters to be applied dynamically on the data file & I would like to calculate the reduction in loan amount. Let's consider the first filter, it is saying "Industry_name" column should not have "Clasentrix". So from the data file I want to filter "Industry_name", which will not have 'Clasentrix' entry. Now let's say for 11 observations we had 'Clasentrix' out of 61 observations from the data file. Then we will take sum of entire loan amount(61 rows) & then subtract the sum of loan amount for 11 rows which consist 'Clasentrix' from the total loan amount. This number will be considered as reduction for after applying the first filter.

Now for each of n criterion I want to calculate the reduction dynamically in python. So inside the loop the filter json file will create filter considering attribute, expression & value. Just like for the first filter it is "Industry_name != 'Clasentrix'". This should get reflected for each set of rows for the json object like for the second criterion(filter) it should be "currency=['EUR','GBP','INR']" & so on. I also want to calculate the reduction accordingly.

I am struggling to create the python code for the above mentioned exercise. My post is too long, apologies for that. But please provide assistance that how can I calculate the reduction dynamically for each n criterion.

Thanks in advance!!

UPDATE for first data file, find some sample rows;

[{
        "industry_id.": 1234,
        "loan_id": 1113456,
        "Industry_name": "Clasentrix",
        "currency": "EUR",
        "Industry_Rating": "Ba3",
        "Due_date": "20/02/2020",
        "loan_amount": 563332790,
        "currency_rate": 0.67,
        "country": "USA"


    },{ 
        "industry_id.": 6543,
        "loan_id": 1125678,
        "Industry_name": "Wolver",
        "currency": "GBP",
        "Industry_Rating": "Aa3",
        "Due_date": "23/05/2020",
        "loan_amount": 33459087,
        "currency_rate": 0.8,
        "country": "UK"


    },{
        "industry_id.": 1469,
        "loan_id": "8876548",
        "Industry_name": "GroupOn",
        "currency": "EUR",
        "Industry_Rating": "Aa1",
        "Due_date": "16/09/2021",
        "loan_amount": 66543278,
        "currency_rate": 0.67,
        "country": "UK"
    },{
        "industry_id.": 1657,
        "loan_id": "6654321",
        "Industry_name": "Clasentrix",
        "currency": "EUR",
        "Industry_Rating": "Ba3",
        "Due_date": "15/07/2020",
        "loan_amount": 5439908765,
        "currency_rate": 0.53,
        "country": "USA"

    }
    ]

Stuart · Accepted Answer

You can use Pandas to turn the json data into a dataframe and turn the criteria into query strings. Some processing is needed to turn the criteria json into a valid query. In the code below dates are still treated as strings - you may need to explicitly set date queries to convert the string into a date first.

import pandas as pd
import json
# ...
criteria = json.load(url)
df = pd.DataFrame(json.load(data_url)) # data_url is the handle of the data file
print("Loan total without filters is {}".format(df["loan_amount"].sum()))

for c in criteria:
    if c["expression"] == "=":
        c["expression"] = "=="

    # If the value is a string we need to surround it in quotation marks
    # Note this can break if any values contain "
    if isinstance(c["value"], basestring):
        query = '{attributes} {expression} "{value}"'.format(**c)
    else:
        query = '{attributes} {expression} {value}'.format(**c)
    loan_total = df.query(query)["loan_amount"].sum()
    print "With criterion {}, {}, loan total is {}".format(c["criteria_no."], query, loan_total)

Alternatively you can turn each criterion into an indexing vector like this:

def criterion_filter(s, expression, value):
    if type(value) is list:
        if expression == "=":
            return s.isin(value)
        elif expression == "!=":
            return ~s.isin(value)
    else:
        if expression == "=":
            return s == value
        elif expression == "!=":
            return s != value
        elif expression == "<":
            return s < value
        elif expression == ">":
            return s > value        

for c in criteria:
    filt = criterion_filter(df[c["attributes"]], c["expression"], c["value"])
    loan_total = df[filt]["loan_amount"].sum()
    print "With criterion {}, loan total is {}".format(c["criteria_no."],  loan_total)

EDIT: To calculate the cumulative reduction in loan total, you can combine the indexing vectors using the & operator.

loans = [df["loan_amount"].sum()]
print("Loan total without filters is {}".format(loans[0]))
filt = True
for c in criteria:
    filt &= criterion_filter(df[c["attributes"]], c["expression"], c["value"])
    loans.append(df[filt]["loan_amount"].sum())
    print "Adding criterion {} reduces the total by {}".format(c["criteria_no."],
        loans[-2] - loans[-1])
    print "The cumulative reduction is {}".format(loans[0] - loans[-1])

Looping json data dynamically in Python

Answers (1)

Related Questions