Trying to read JSON from URL and parse into CSV format

Question

I am trying to loop through four URLs in a list, scrape the contents of each URL, and save each as a separate CSV. I think my code below is close, but it doesn't really seem to parse the JSON strings into a human-readable format. Also, the headers are missing.

Here is my hacked together code.

import urllib
import requests
import json
import pandas as pd
from pandas.io.json import json_normalize

all_links = ['https://www.baptisthealthsystem.com/docs/global/standard-charges/474131755_abrazomaranahospital_standardcharges.json?sfvrsn=9a27928_2',
  'https://www.baptisthealthsystem.com/docs/global/standard-charges/621861138_abrazocavecreekhospital_standardcharges.json?sfvrsn=674fd6f_2',
  'https://www.baptisthealthsystem.com/docs/global/standard-charges/621809851_abrazomesahospital_standardcharges.json?sfvrsn=13953222_2',
  'https://www.baptisthealthsystem.com/docs/global/standard-charges/621811285_abrazosurprisehospital_standardcharges.json?sfvrsn=c8113dcf_2']
for item in all_links:
    #print(item)
    try:
        length = len(item)
        first_under = item.find('_') + 1
        last_under = item.rfind('?') - 21
        file_name = item[first_under:last_under]
        r = requests.get(item)
        print(r.json)
        df = pd.DataFrame(r)
        df.head()
        DOWNLOAD_PATH = 'C:\Users\ryans\Desktop\hospital_data\' + file_name + '.csv'
        #urllib.request.urlretrieve(df,DOWNLOAD_PATH)
        r = requests.get(item)
        with open(DOWNLOAD_PATH,'wb') as f:
            f.write(r.content)
    except Exception as e: print(e)

This is what the data looks like. Is this right? I thought the data would look a lot cleaner if it was converted from JSON to CSV.

Adid · Accepted Answer

You are close, here's what you need to change:

You can use pandas dataframes to read json using df = pd.read_json(text, lines=True) - for this make sure to specify lines=True because some of your data contains characters
You can use the same dataframe to output to a csv using df.to_csv(file)

All in all, there are some things in your code that could be removed, e.g. you're calling requests.get twice for no real reason, which slows your code down substantially.

import requests
import pandas as pd

all_links = ['https://www.baptisthealthsystem.com/docs/global/standard-charges/474131755_abrazomaranahospital_standardcharges.json?sfvrsn=9a27928_2',
  'https://www.baptisthealthsystem.com/docs/global/standard-charges/621861138_abrazocavecreekhospital_standardcharges.json?sfvrsn=674fd6f_2',
  'https://www.baptisthealthsystem.com/docs/global/standard-charges/621809851_abrazomesahospital_standardcharges.json?sfvrsn=13953222_2',
  'https://www.baptisthealthsystem.com/docs/global/standard-charges/621811285_abrazosurprisehospital_standardcharges.json?sfvrsn=c8113dcf_2']
for item in all_links:
    try:
        length = len(item)
        first_under = item.find('_') + 1
        last_under = item.rfind('?') - 21
        file_name = item[first_under:last_under]
        r = requests.get(item)
        df = pd.read_json(r.text, lines=True)
        DOWNLOAD_PATH = 'C:\Users\ryans\Desktop\hospital_data\' + file_name + '.csv'
        with open(DOWNLOAD_PATH,'wb') as f:
            df.to_csv(f)
    except Exception as e: print(e)

Trying to read JSON from URL and parse into CSV format

Answers (2)

Related Questions