Reputation: 95
I wrote a simple program to scrape data from https://stats.nba.com. My code here works absolutely fine, as it is able to get the data from the website perfectly:
chrome_options = webdriver.ChromeOptions()
d = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
d.get('https://stats.nba.com/teams/advanced/?sort=W&dir=-1')
scrape = BeautifulSoup(d.page_source, 'html.parser').find('table')
for row in scrape.find_all('tr'):
for col in row.find_all('td'):
#...more parsing code here
However, as soon as I add
chrome_options.add_argument('--headless')
, the entire code fails and I get AttributeError: 'NoneType' object has no attribute 'find_all'
.
Why does this happen? I've looked everywhere and cannot find a solution. Thanks!
Edit: the problems seems to be that d.page_source
gives different results for headless and non-headless. Does anyone know why there is a discrepancy?
Upvotes: 2
Views: 3100
Reputation: 1224
Edit:
I think I've found the solution. It appears that they have a system that checks the user-agent of the browser and they don't allow headless chrome
so try to add this to your code:
# ...
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
chrome_options.add_argument(f'user-agent={user_agent}')
# ...
This is the output that I receive from that:
scrape = BeautifulSoup(d.page_source, 'html.parser').find('table')
for row in scrape.find_all('tr'):
print(row)
# <tr>
# <th></th>
# <th cf="" class="text" data-field="TEAM_NAME" ripple="" sort=""><br/>TEAM</th>
# <th cf="" data-dir="-1" data-field="GP" data-rank="" ripple="" sort="">GP</th>
# <th cf="" class="sorted asc" data-dir="-1" data-field="W" data-rank="" ripple="" sort="">W</th>
# <th cf="" data-dir="-1" data-field="L" data-rank="" ripple="" sort="">L</th>
Upvotes: 11