Chris Wood
Chris Wood

Reputation: 35

Pandas Nested JSON - id as column name, value as value

JSON example

{
    "tickets": [
        {
            "url": "https://domain.zendesk.com/api/v2/tickets/10001.json",
            "id": 10001,
            "custom_fields": [
                {
                    "id": 360007982393,
                    "value": "Some Value"
                },
                {
                    "id": 360008063134,
                    "value": "Foo"
                },
                {
                    "id": 360007982273,
                    "value": "Bar"
                },
                {
                    "id": 360007982293,
                    "value": null
                }
            ],
            "satisfaction_rating": null
        },
        {
            "url": "https://domain.zendesk.com/api/v2/tickets/10002.json",
            "id": 10002,
            "custom_fields": [
                {
                    "id": 360007982393,
                    "value": "Another value"
                },
                {
                    "id": 360008063134,
                    "value": "Bar"
                },
                {
                    "id": 360007982273,
                    "value": "Foo"
                },
                {
                    "id": 360007982293,
                    "value": null
                }
            ],
            "satisfaction_rating": null
        }
    ],
    "count": 2,
    "next_page": "https://domain.zendesk.com/api/v2/incremental/tickets.json?start_time=1541167467",
    "end_time": 1541167467
}

Sample Python

json = << Above JSON >>
tickets_json = json['tickets']
result = json_normalize(data=tickets_json, sep='_')

df=pd.DataFrame(result)

Query description

So with the above, I get a dataframe that comes out as:

url, id, custom_fields, satisfaction_rating

Custom_Fields is where I am struggling, as I need to get:

url, id, custom_fields_360007982393, custom_fields_360008063134, custom_fields_360007982273, custom_fields_360007982293, satisfaction_rating

Or similar to the above, as effectively I need the id's from within the custom values to part of or become the names of the columns in the main dataframe.

I have tried to use the record_path with meta, but this flips the data frame into an unusable format for what I am trying to achieve here. I tried pulling out custom_fields and then appending it back, but all I could find was to just have a random number as the column name and then just have the same id, value pair as the value on each row.

This data will get imported into MySQL and be used to report on the values against the ID. For the most part the custom_fields are in the same order, however I cannot be certain they will remain this way forever.

The JSON comes from the ZenDesk API (https://developer.zendesk.com/rest_api)

The output goal:

url, id, 360007982393, 360008063134, 360007982273, 360007982293, satisfaction_rating
"https:.." , 10001, "Some Value", "Foo", "Bar", null, null
"https:.." , 10002, "Another value", "Bar", "Foo", null, null

Upvotes: 1

Views: 630

Answers (1)

Lukas Thaler
Lukas Thaler

Reputation: 2720

What about writing a custom parser function for the above json format? Below a script that transfroms one ticket into a "flat" json that will work with json_normalize:

def parseCustoms(input):
    out = {'url': input['url'],
           'id': input['id'],
           'satisfaction_rating': input['satisfaction_rating']}
    cust_fields = [(str(x['id']),x['value']) for x in input['custom_fields']]
    for field in cust_fields:
        out['cf_' + field[0]] = field[1]
    return out

You would create your parsed array like

parsed_tickets = [parseCustoms(ticket) for ticket in tickets_json]

And now, json_normalize will behave as desired

result = json_normalize(parsed_tickets)

Upvotes: 1

Related Questions