Reputation: 25

Web-scraping with Python: NoneType error, can't scrape table's data

this is my first attempt at coding so please forgive my daftness. I'm trying to learn web scraping by practising with this link: https://data.gov.sg/dataset/industrial-arbitration-court-awards-by-nature-of-trade-disputes?view_id=d3e444ef-54ed-4d0b-b715-1ee465f6d882&resource_id=c24d0d00-2d12-4f68-8fc9-4121433332e0

I've honestly spent hours trying to figure out what's wrong with my code here:

import csv
import requests
from BeautifulSoup import BeautifulSoup

url = 'https://data.gov.sg/dataset/industrial-arbitration-court-awards-by-nature-of-trade-disputes?view_id=d3e444ef-54ed-4d0b-b715-1ee465f6d882&resource_id=c24d0d00-2d12-4f68-8fc9-4121433332e0'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody')

list_of_rows = []
for row in table.find('tr'):
    list_of_cells = []
    for cell in row.findAll('td'):
        list_of_cells.append()
    list_of_rows.append(list_of_cells)

outfile = open("./indarb.csv","wb")
writer = csv.writer(outfile)

My terminal then spits out this: 'NoneType' object has no attribute 'find', saying there's an error in line 13. Not sure if it helps in queries but this is a list of what I've tried:

Different permutations of 'find'/'findAll'

Instead of '.find', used '.findAll'
Instead of '.findAll', used '.find'

Different permutations for line 10

Tried soup.find('tbody')
Tried soup.find('table')
Opened source code, tried soup.find('table', attrs={'class':'table table-condensed'})

Different permutations for line 13

similarly tried with just 'tr' tag; or
tried adding 'attrs={}' stuff

I've really tried but can't figure out why I can't scrape that simple 10 row table. If anyone could post code that works, that'd be phenomenal. Thank you for your patience!

Upvotes: 0

Answers (2)

Padraic Cunningham

Reputation: 180482

You have a few mistakes, the biggest is you are using BeautifulSoup3 which has not been developed for years, you should be use bs4, you also need to use find_all when you want want multiple tags. Also you have not passed cell to list_of_cells.append() on line 13 so that is the cause of your other error:

from bs4 import BeautifulSoup

url = 'https://data.gov.sg/dataset/industrial-arbitration-court-awards-by-nature-of-trade-disputes?view_id=d3e444ef-54ed-4d0b-b715-1ee465f6d882&resource_id=c24d0d00-2d12-4f68-8fc9-4121433332e0%27'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('table')

list_of_rows = []
for row in table.find_all('tr'):
    list_of_cells = []
    for cell in row.find_all('td'):
        list_of_cells.append(cell)
    list_of_rows.append(list_of_cells)

I am not sure exactly what you want but that appends the tds from the first table on the page. There is also and api you can use and adownloadable csv if you do actually want the data.

Upvotes: 0

Ugo T.

Reputation: 1068

The URL you request in your code is not HTML but JSON.

Upvotes: 0

Web-scraping with Python: NoneType error, can&#39;t scrape table&#39;s data

Answers (2)

Related Questions

Web-scraping with Python: NoneType error, can't scrape table's data