Nick
Nick

Reputation: 687

Python: Scraping .json, feeding into pandas DataFrame and output as excel

Good morning/afternoon/evening to everyone reading this.

So as you will see from the snippet below I am not a programmer and I'm really struggling to make this code give me the desired output.
Quick rundown:

Now what I'm hoping to get for the output excel files would look something like this:

[[1]

However what I'm currently getting as output looks like this:

enter image description here

This is what fdir[0] returns:

  [{'Nama': 'Santosa', 'Jabatan': 'PRESIDEN DIREKTUR', 'Afiliasi': False}, {'Nama': 'Joko Supriyono', 'Jabatan': 'WAKIL PRESIDEN DIREKTUR', 'Afiliasi': False}, {'Nama': 'M. Hadi Sugeng Wahyudiono', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Bambang Wijanarko', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Rujito Purnomo', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Handoko Pranoto', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Mario Casimirus Surung Gultom', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}]

So if anyone has an idea of how to get from what I have to the intended output I would be very grateful! Thanks for taking the time, complete code is below:

import requests
import pandas as pd ; import xlsxwriter
import json
import time
# gets broad data of main page of the stock exchange
sxov = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')

data = sxov.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file
# removes unecessary columns from df
df.drop(["BAE", "DataID",   "Divisi",   "EfekEmiten_EBA",   "EfekEmiten_ETF",   "EfekEmiten_Obligasi",  "EfekEmiten_SPEI",  "EfekEmiten_Saham",
"Fax",  "JenisEmiten", "KodeDivisi", "Logo",    "NPKP",     "NPWP", "PapanPencatatan", "Status", "TanggalPencatatan", "id"
], axis=1, inplace=True)

cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # outputs DataFrame as Excel file

list_of_json = []
for nested_json in data['data'] :
    list_of_json.append(requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten='+nested_json['KodeEmiten']).json())

#create empty lists for directors
fdir = []

for i in range (len(list_of_json)) :
    fdir.append(list_of_json[i].get('Direktur'))
    i += 1


print (fdir[0])
# create writer object that converts DataFrame to {currentdate}Firm Details.xlsx
writer = pd.ExcelWriter(f'{cdate}Firm Details.xlsx', engine = 'xlsxwriter')

#creating the dataframes
dfdir = pd.DataFrame([fdir])
dfdir = (dfdir.T)

dfdir.to_excel(writer, sheet_name = 'Directors')

writer.save()

Upvotes: 0

Views: 514

Answers (1)

B Troy
B Troy

Reputation: 181

I suspect that json_normalize will help. It'll transfer your json file to a flat table structure.
documentation_link

eg:

from pandas.io.json import json_normalize
with open('example_1.json') as data_file:    
    d = json.load(data_file)
    df = json_normalize(d)

Upvotes: 1

Related Questions