Reputation: 37
I'm getting an error when I try to print out the text content from my web scraper when using BeautifulSoup. This is my code
import requests
from bs4 import BeautifulSoup
form = "Form W-2"
URL = "https://apps.irs.gov/app/picklist/list/priorFormPublication." \
"html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&criteria=formNumber&value=" \
""+form+"&isDescending=false"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="picklistContentPane")
# form_elements = results.find_all("div", class_="picklistTable")
table_elements = results.find_all("tr")
for table_element in table_elements:
form_number = table_element.find("td", class_="LeftCellSpacer")
form_title = table_element.find("td", class_="MiddleCellSpacer")
form_year = table_element.find("td", class_="EndCellSpacer")
print(form_number.text)
print(form_title.text)
print(form_year.text)
print()
# print(table_elements)
The error I'm getting is
File "/Users/user/PycharmProjects/project/main.py", line 18, in <module>
print(form_number.text)
AttributeError: 'NoneType' object has no attribute 'text'
I'm trying to print the text content that is in each td can anyone help? This is the website for reference
Upvotes: 0
Views: 517
Reputation: 195438
One solution is to select the right <table>
and only rows (<tr>
) which contain <td>
tags:
import requests
from bs4 import BeautifulSoup
form = "Form W-2"
URL = (
"https://apps.irs.gov/app/picklist/list/priorFormPublication."
"html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&criteria=formNumber&value="
"" + form + "&isDescending=false"
)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
for table_element in soup.select(".picklist-dataTable tr:has(td)"): # <-- change the selector here
form_number = table_element.find("td", class_="LeftCellSpacer")
form_title = table_element.find("td", class_="MiddleCellSpacer")
form_year = table_element.find("td", class_="EndCellSpacer")
print(form_number.text.strip())
print(form_title.text.strip())
print(form_year.text.strip())
print()
Prints:
Form W-2 P
Statement For Recipients of Annuities, Pensions, Retired Pay, or IRA Payments
1990
Form W-2 P
Statement For Recipients of Annuities, Pensions, Retired Pay, or IRA Payments
1989
Form W-2 P
Statement For Recipients of Annuities, Pensions, Retired Pay, or IRA Payments
1988
...and so on.
Upvotes: 1