Oliverater
Oliverater

Reputation: 61

Print JSON data from csv list of multiple urls

Very new to Python and haven't found specific answer on SO but apologies in advance if this appears very naive or elsewhere already.

I am trying to print 'IncorporationDate' JSON data from multiple urls of public data set. I have the urls saved as a csv file, snippet below. I am only getting as far as printing ALL the JSON data from one url, and I am uncertain how to run that over all of the csv urls, and write to csv just the IncorporationDate values.

Any basic guidance or edits are really welcomed!

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

import json


def get_jsonparsed_data(url):

    response = urlopen(url)
    data = response.read().decode("utf-8")
    return json.loads(data)


url = ("http://data.companieshouse.gov.uk/doc/company/01046514.json")
print(get_jsonparsed_data(url))

import csv
with open('test.csv') as f:
    lis=[line.split() for line in f]
    for i,x in enumerate(lis):              
        print ()

import StringIO
s = StringIO.StringIO()
with open('example.csv', 'w') as f:
    for line in s:
        f.write(line)

Snippet of csv:

http://business.data.gov.uk/id/company/01046514.json
http://business.data.gov.uk/id/company/01751318.json
http://business.data.gov.uk/id/company/03164710.json
http://business.data.gov.uk/id/company/04403406.json
http://business.data.gov.uk/id/company/04405987.json

Upvotes: 1

Views: 607

Answers (2)

Sean Parsons
Sean Parsons

Reputation: 762

Welcome to the Python world.

  • For dealing with making http requests, we commonly use requests because it's dead simple api.

The code snippet below does what I believe you want:

  1. It grabs the data from each of the urls you posted
  2. It creates a new CSV file with each of the IncorporationDate keys.

```

import csv
import requests

COMPANY_URLS = [
    'http://business.data.gov.uk/id/company/01046514.json',
    'http://business.data.gov.uk/id/company/01751318.json',
    'http://business.data.gov.uk/id/company/03164710.json',
    'http://business.data.gov.uk/id/company/04403406.json',
    'http://business.data.gov.uk/id/company/04405987.json',
]

def get_company_data():
    for url in COMPANY_URLS:
        res = requests.get(url)
        if res.status_code == 200:
             yield res.json()

if __name__ == '__main__':
    for data in get_company_data():
        try:
            incorporation_date = data['primaryTopic']['IncorporationDate']
        except KeyError:
            continue
        else:
            with open('out.csv', 'a') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow([incorporation_date])

```

Upvotes: 1

ExtractTable.com
ExtractTable.com

Reputation: 811

First step, you have to read all the URLs in your CSV

import csv
csvReader = csv.reader('text.csv')
# next(csvReader) uncomment if you have a header in the .CSV file
all_urls = [row for row in csvReader if row]

Second step, fetch the data from the URL

from urllib.request import urlopen
def get_jsonparsed_data(url):
    response = urlopen(url)
    data = response.read().decode("utf-8")
    return json.loads(data)

url_data = get_jsonparsed_data("give_your_url_here")

Third step:

  1. Go through all URLs that you got from CSV file
  2. Get JSON data
  3. Fetch the field what you need, in your case "IncorporationDate"
  4. Write into an output CSV file, I'm naming it as IncorporationDates.csv

Code below:

for each_url in all_urls:
    url_data = get_jsonparsed_data(each_url)
    with open('IncorporationDates.csv', 'w' ) as abc:
        abc.write(url_data['primaryTopic']['IncorporationDate'])

Upvotes: 1

Related Questions