H D
H D

Reputation: 81

Python - Web Scraping: TypeError: string indices must be integers

I am trying to web scrape a website and turn the data into a csv file for practice purposes and when I get to a point where it collects the data and stores it into a variable I get this error:

TypeError: string indices must be integers

regarding to this line:

 email = address['email'].strip()

I wish for it to collect all the data to be wrote into a csv file. The whole code is as follows:

from urllib.request import urlopen as uReq
import json
import re
import csv

my_url = 'https://www.haart.co.uk/umbraco/api/branches/getsales/HRT'
uClient = uReq(my_url)
page_json = uClient.read()
uClient.close()
records = []
filename = 'haartscrape.csv'

addresses = json.loads(page_json)

for address in addresses:
    headline = address['headline']
    address = re.sub(r'\<.*?\>', '', address['address'])
    email = address['email'].strip()
    tel = address['telephone']


    records.append({'Name':headline, 'Address':address, 'Email': email, 'Telephone':tel})

with open(filename, 'w') as f:
    writer = csv.DictWriter(f, ['Name', 'Address', 'Email', 'Telephone'])
    writer.writeheader()
    for r in records:
        writer.writerow(r)

Full Traceback:

Traceback (most recent call last):
  File "haart_webscrape.py", line 18, in <module>
    email = address['email'].strip()
TypeError: string indices must be integers

Any help is appreciated. Thank you in advance.

Upvotes: 1

Views: 222

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191681

You're reassigning your JSON element

for address in addresses:
    headline = address['headline']
    address =  # here 

Either rename the loop variable, or the other

Or do this

with open(filename, 'w') as f:
    writer = csv.DictWriter(f, ['Name', 'Address', 'Email', 'Telephone'])
    writer.writeheader()
    for address in addresses:
        r = {
            'Name':address['headline'], 
            'Address':re.sub(r'\<.*?\>', '', address['address'], 
            'Email': address['email'].strip(),
            'Telephone':address['telephone']}
        writer.writerow(r)

Upvotes: 2

Related Questions