Andre Heslop
Andre Heslop

Reputation: 37

Beautiful Soup 'Attribute Error' when getting text content

I'm getting an error when I try to print out the text content from my web scraper when using BeautifulSoup. This is my code

import requests
from bs4 import BeautifulSoup

form = "Form W-2"
URL = "https://apps.irs.gov/app/picklist/list/priorFormPublication." \
      "html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&criteria=formNumber&value=" \
      ""+form+"&isDescending=false"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="picklistContentPane")
# form_elements = results.find_all("div", class_="picklistTable")
table_elements = results.find_all("tr")
for table_element in table_elements:
    form_number = table_element.find("td", class_="LeftCellSpacer")
    form_title = table_element.find("td", class_="MiddleCellSpacer")
    form_year = table_element.find("td", class_="EndCellSpacer")
    print(form_number.text)
    print(form_title.text)
    print(form_year.text)
    print()
# print(table_elements)

The error I'm getting is

  File "/Users/user/PycharmProjects/project/main.py", line 18, in <module>
    print(form_number.text)
AttributeError: 'NoneType' object has no attribute 'text'

I'm trying to print the text content that is in each td can anyone help? This is the website for reference

Upvotes: 0

Views: 517

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

One solution is to select the right <table> and only rows (<tr>) which contain <td> tags:

import requests
from bs4 import BeautifulSoup

form = "Form W-2"
URL = (
    "https://apps.irs.gov/app/picklist/list/priorFormPublication."
    "html?resultsPerPage=200&sortColumn=sortOrder&indexOfFirstRow=0&criteria=formNumber&value="
    "" + form + "&isDescending=false"
)
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
for table_element in soup.select(".picklist-dataTable tr:has(td)"):  # <-- change the selector here
    form_number = table_element.find("td", class_="LeftCellSpacer")
    form_title = table_element.find("td", class_="MiddleCellSpacer")
    form_year = table_element.find("td", class_="EndCellSpacer")
    print(form_number.text.strip())
    print(form_title.text.strip())
    print(form_year.text.strip())
    print()

Prints:

Form W-2 P
Statement For Recipients of Annuities, Pensions, Retired Pay, or IRA Payments
1990

Form W-2 P
Statement For Recipients of Annuities, Pensions, Retired Pay, or IRA Payments
1989

Form W-2 P
Statement For Recipients of Annuities, Pensions, Retired Pay, or IRA Payments
1988


...and so on.

Upvotes: 1

Related Questions