cr_untilted
cr_untilted

Reputation: 45

Parsing JSON in Python and Converting to Excel

I am attempting to parse this JSON and convert it to Excel and only print out certain columns using Python but I keep getting an TypeError: string indices must be integers. The JSON file continues to repeat what is inside "d" over and over again.

JSON:

{
    "d": [
        {
            "__type": "New Cust",
            "Description": "TV Purchase",
            "End": "/Date(1624962600000)/",
            "ID": 1,
            "Resources": [
                {
                    "Available": true,
                    "Key": "12345",
                    "Text": "John Doe",
                    "Type": "SalesProvider",
                    "CalendarID": "1234"
                }
            ],
            "Subject": "Meeting with Sam",
            "Visible": true,
            "AppStatus": "5",
            "StartTime": "06/01/2021 10:30:00",
            "AppID": "1",
            "ServiceID": "7",
            "NotesCount": "0",
            "CustomerID": "1",
            "AppType": "NNR",
            "IsEditedThis": "False",
            "BusinessPackageID": "0",
            "PopupNotesCount": "0",
            "EventType": "1",
            "SPName": "John Doe",
            "SPID": "12345",
            "EventCapacity": "0",
            "IsPaid": "False",
            "IsWorkShop": "False",
            "FormStatus": "0",
            "PaidType": "1",
            "AppComment": "",
            "CustName": "Sam Will",
            "ResourceID": "",
            "CssClass": "rsCategoryBlue",
            "ServiceName": "TV Sale",
            "NoOfAttendees": null,
            "StreamingNoOfAttendees": null,
            "StreamingStatus": "0",
            "StreamingEventCapacity": "",
            "Photo": "",
            "PersonalOffType": null,
            "ResourceName": null,
            "IsShowCheckIn": false,
            "PaymentStatus": 0
        },
        {
            "__type": "New Cust",
            "Description": "Receiver Purchase",
            "End": "/Date(1624962600000)/",
            "ID": 1,
            "Resources": [
                {
                    "Available": true,
                    "Key": "12345",
                    "Text": "John Doe",
                    "Type": "SalesProvider",
                    "CalendarID": "1234"
                }
            ],
            "Subject": "Meeting with Bill",
            "Visible": true,
            "AppStatus": "5",
            "StartTime": "07/02/2021 9:30:00",
            "AppID": "1",
            "ServiceID": "7",
            "NotesCount": "0",
            "CustomerID": "1",
            "AppType": "NNR",
            "IsEditedThis": "False",
            "BusinessPackageID": "0",
            "PopupNotesCount": "0",
            "EventType": "1",
            "SPName": "John Doe",
            "SPID": "12345",
            "EventCapacity": "0",
            "IsPaid": "False",
            "IsWorkShop": "False",
            "FormStatus": "0",
            "PaidType": "1",
            "AppComment": "",
            "CustName": "Bill Tom",
            "ResourceID": "",
            "CssClass": "rsCategoryBlue",
            "ServiceName": "Audio Sale",
            "NoOfAttendees": null,
            "StreamingNoOfAttendees": null,
            "StreamingStatus": "0",
            "StreamingEventCapacity": "",
            "Photo": "",
            "PersonalOffType": null,
            "ResourceName": null,
            "IsShowCheckIn": false,
            "PaymentStatus": 0
        }
    ]
}

Python Code:

import json
import pandas as pd

f = open('JSON.txt', 'r')
data = json.loads(f.read())
l = []
for profile in data['d']:
    l.append(profile["Subject"]["StartTime"]["IsPaid"]["CustName"]["ServiceName"])
df1 = pd.DataFrame(l)
print(df1)
df1.to_excel('df1.xlsx')

I do not need the "Resources": [] info I just need certain parameters outside it in the JSON object. I am having difficulty parsing the JSON any help would be appreciated.

Upvotes: 0

Views: 614

Answers (1)

Umar.H
Umar.H

Reputation: 23099

you can use a combination of the json standard library and pd.json_normalize

import json 
import pandas as pd


parsed_json = json.loads('JSON.txt')

df = pd.json_normalize(parsed_json['d'],record_path='Resources')

print(df)

   Available    Key      Text           Type CalendarID
0       True  12345  John Doe  SalesProvider       1234
1       True  12345  John Doe  SalesProvider       1234

then pass it to excel

df.to_excel(...,index=False)

Going back to your issues, it seems like your trying to grab a bunch of fields like a list, when in reality you're attempting to find an attribute from a single data type object.

print(data['d'][0]['Subject'])

'Meeting with Sam'

This has no other nested items so you'll naturally get an error.

the error TypeError: string indices must be integers is telling you you can only slice this object with integers i.e

#reversing the string.
print(data['d'][0]['Subject'][::-1])
maS htiw gniteeM

or

#first two characters of the string.
print(data['d'][0]['Subject'][:2])
Me

if you want to grab only a subset of columns from the top level you could do :

cols = ['Subject', 'StartTime', 'IsPaid', 'CustName', 'ServiceName']

df = pd.json_normalize(parsed_json['d'],)[cols]

print(df)

             Subject            StartTime IsPaid  CustName ServiceName
0   Meeting with Sam  06/01/2021 10:30:00  False  Sam Will     TV Sale
1  Meeting with Bill   07/02/2021 9:30:00  False  Bill Tom  Audio Sale

Upvotes: 1

Related Questions