Reputation: 81
I am trying to web scrape a website and turn the data into a csv file for practice purposes and when I get to a point where it collects the data and stores it into a variable I get this error:
TypeError: string indices must be integers
regarding to this line:
email = address['email'].strip()
I wish for it to collect all the data to be wrote into a csv file. The whole code is as follows:
from urllib.request import urlopen as uReq
import json
import re
import csv
my_url = 'https://www.haart.co.uk/umbraco/api/branches/getsales/HRT'
uClient = uReq(my_url)
page_json = uClient.read()
uClient.close()
records = []
filename = 'haartscrape.csv'
addresses = json.loads(page_json)
for address in addresses:
headline = address['headline']
address = re.sub(r'\<.*?\>', '', address['address'])
email = address['email'].strip()
tel = address['telephone']
records.append({'Name':headline, 'Address':address, 'Email': email, 'Telephone':tel})
with open(filename, 'w') as f:
writer = csv.DictWriter(f, ['Name', 'Address', 'Email', 'Telephone'])
writer.writeheader()
for r in records:
writer.writerow(r)
Full Traceback:
Traceback (most recent call last):
File "haart_webscrape.py", line 18, in <module>
email = address['email'].strip()
TypeError: string indices must be integers
Any help is appreciated. Thank you in advance.
Upvotes: 1
Views: 222
Reputation: 191681
You're reassigning your JSON element
for address in addresses:
headline = address['headline']
address = # here
Either rename the loop variable, or the other
Or do this
with open(filename, 'w') as f:
writer = csv.DictWriter(f, ['Name', 'Address', 'Email', 'Telephone'])
writer.writeheader()
for address in addresses:
r = {
'Name':address['headline'],
'Address':re.sub(r'\<.*?\>', '', address['address'],
'Email': address['email'].strip(),
'Telephone':address['telephone']}
writer.writerow(r)
Upvotes: 2