Reputation: 11
I have this website: https://erowid.org/general/big_chart.shtml I am trying to extract the names of each drug by using beautiful soup to access their tables.
This code works perfectly:
chem_table = bs.find('table', id="section-CHEMICALS")
for row in chem_table.find_all('tr'):
print(row.find('a').contents[0])
this code gives me the following error, despite both tables being the same format:
plants_table = bs.find('table', id="section-PLANTS")
for r in plants_table.find_all('tr'):
print(r.find('a').contents[0])
This is the error I get for the second block: AttributeError: 'NoneType' object has no attribute 'contents'
However, 'print(r.find('a'))' works perfectly.
I tried to see if 'a' existed by running
r.find('a'),
which gave the correct results. Then tried
r.find('a').text,
which again gave me a NoneType error.
Upvotes: 1
Views: 111
Reputation: 195553
The first row in the PLANTS table doesn't contain any <a>
tag so you need to check for that:
import requests
from bs4 import BeautifulSoup
url = 'https://erowid.org/general/big_chart.shtml'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
plants_table = soup.select_one('#section-PLANTS')
for r in plants_table.select('tr:has(a)'):
print(r.find('a').text)
Prints:
...
tobacco
virola
voacanga_africana
wormwood
yerba_mate
yohimbe
plants_table.select('tr:has(a)')
is using CSS selector tr:has(a)
which selects all <tr>
tags containing at least one <a>
tag.
Upvotes: 2