jbenfleming
jbenfleming

Reputation: 75

Incorrect number of html elements being returned

I'm scraping from the following page: https://www.pro-football-reference.com/boxscores/201809060phi.htm

I have this code:

import requests
from bs4 import BeautifulSoup

url = 'https://www.pro-football-reference.com/boxscores/201809060phi.htm'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

tables = soup.findAll("div",{"class":"table_outer_container"})
print (len(tables))

Each table on the page has the element "div",{"class":"table_outer_container"}. But my print statement only returns 1. Am I wrong in believing that my findAll statement will assign all of those elements to the variable, "tables"?

Upvotes: 0

Views: 28

Answers (1)

SIM
SIM

Reputation: 22440

It's because most of the tables are within comments and your script wont grab them unless you kick out those vicious signs -->,<!-- from response. Try the following. It should give you 20 tables from that page.

import requests
from bs4 import BeautifulSoup

url = 'https://www.pro-football-reference.com/boxscores/201809060phi.htm'

r = requests.get(url).text
res = r.replace("<!--","").replace("-->","")
soup = BeautifulSoup(res, 'lxml')

tables = soup.findAll("div",{"class":"table_outer_container"})
print (len(tables))

Upvotes: 1

Related Questions