Reputation: 25
this is my first attempt at coding so please forgive my daftness. I'm trying to learn web scraping by practising with this link: https://data.gov.sg/dataset/industrial-arbitration-court-awards-by-nature-of-trade-disputes?view_id=d3e444ef-54ed-4d0b-b715-1ee465f6d882&resource_id=c24d0d00-2d12-4f68-8fc9-4121433332e0
I've honestly spent hours trying to figure out what's wrong with my code here:
import csv
import requests
from BeautifulSoup import BeautifulSoup
url = 'https://data.gov.sg/dataset/industrial-arbitration-court-awards-by-nature-of-trade-disputes?view_id=d3e444ef-54ed-4d0b-b715-1ee465f6d882&resource_id=c24d0d00-2d12-4f68-8fc9-4121433332e0'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('tbody')
list_of_rows = []
for row in table.find('tr'):
list_of_cells = []
for cell in row.findAll('td'):
list_of_cells.append()
list_of_rows.append(list_of_cells)
outfile = open("./indarb.csv","wb")
writer = csv.writer(outfile)
My terminal then spits out this: 'NoneType' object has no attribute 'find', saying there's an error in line 13. Not sure if it helps in queries but this is a list of what I've tried:
Different permutations of 'find'/'findAll'
Different permutations for line 10
Different permutations for line 13
I've really tried but can't figure out why I can't scrape that simple 10 row table. If anyone could post code that works, that'd be phenomenal. Thank you for your patience!
Upvotes: 0
Views: 761
Reputation: 180482
You have a few mistakes, the biggest is you are using BeautifulSoup3 which has not been developed for years, you should be use bs4, you also need to use find_all
when you want want multiple tags. Also you have not passed cell to list_of_cells.append()
on line 13 so that is the cause of your other error:
from bs4 import BeautifulSoup
url = 'https://data.gov.sg/dataset/industrial-arbitration-court-awards-by-nature-of-trade-disputes?view_id=d3e444ef-54ed-4d0b-b715-1ee465f6d882&resource_id=c24d0d00-2d12-4f68-8fc9-4121433332e0%27'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table')
list_of_rows = []
for row in table.find_all('tr'):
list_of_cells = []
for cell in row.find_all('td'):
list_of_cells.append(cell)
list_of_rows.append(list_of_cells)
I am not sure exactly what you want but that appends the tds from the first table on the page. There is also and api you can use and adownloadable csv if you do actually want the data.
Upvotes: 0