Web Scraping Python Access Table data

Question

So I am trying to use Beautiful Soup to do some web scraping of this website http://www.killedbypolice.net/kbp2013.html and access the data in the table. My current code is:

url = "http://www.killedbypolice.net/kbp2013.html"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")

data = soup.find_all('table')
data[0]

But... I am getting a maximum recursion depth runtime error. I'm not sure how to access the 'td' fields inside of table which hold the data. Thanks

Padraic Cunningham · Accepted Answer

The error is because the html is very badly formatted, you get RuntimeError: maximum recursion depth exceeded creating the soup object with both lxml and html.parser, the only parser that works at all is html5lib:

html = requests.get("http://www.killedbypolice.net/kbp2013.html").content
soup = BeautifulSoup(html, "html5lib")

# get all the table rows
table = soup.find("table")

That gets the table:

    
......................................................................


# since Jan 1st '14
St.
g/r
Name, Age

KBP link (plus extensive follow-ups)
News link
(2) May 2, 2013        
CA M/B Kenneth Bernard Williams, 55   G facebook.com/KilledByPolice/posts/622539181107556  http://www.nbclosangeles.com/news/local/Police-Shoot-Kill-Suspect-in-Skid-Row-Prompting-Angry-Crowd-to-Gather-205646861.html
(1) May 1, 2013        MI M/B Jordan West-Morson, 26   G facebook.com/KilledByPolice/posts/1033800406648096    http://www.mlive.com/news/detroit/index.ssf/2013/09/detroit_transit_officer_charge.html
Detroit transit officer not guilty in fatal shooting: http://www.clickondetroit.com/news/detroit-transit-officer-not-guilty-in-fatal-shooting/32405878

But then a simple call to find_all:

print(table.find_all("tr"))

Gives you:

 AttributeError: 'NoneType' object has no attribute 'next_element'

The html is just a complete mess, unfortunately I cannot see a simple way to parse it with bs4, this may be one of the rare occasions you need to resort to some regex.

Web Scraping Python Access Table data

Answers (1)

Related Questions