Rapt0r1
Rapt0r1

Reputation: 3

Why is the .get('href') returning "None" on a bs4.element.tag?

I'm pulling together a dataset to do analysis on. The goal is to parse a table on a SEC webpage and pull out the link in a row that has the text "SC 13D" in it. This needs to be repeatable so I can automate it across a large list of links I have in a database. I know this code is not the most Pythonic, but I hacked it together to get what I need out of the table, except for the link in the table row. How can I extract the href value from the table row?

I tried doing a .findAll on 'tr' instead of 'td' in the table (Line 15) but couldn't figure out how to search on "SC 13D" and pop the element from the list of table rows if I performed the .findAll('td'). I also tried to just get the anchor tag with the link in it using the .get('a) instead of .get('href') (included in the code, line 32) but it also returns "None".

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = 'https://www.sec.gov/Archives/edgar/data/1050122/000101143807000336/0001011438-07-000336-index.htm'

html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table',{'summary':'Document Format Files'})
rows = table.findAll("td")

i = 0
pos = 0
for row in rows:
    if "SC 13D" in row:
        pos = i
        break
    else: i = i + 1

linkpos = pos - 1

linkelement = rows[linkpos]

print(linkelement.get('a'))
print(linkelement.get('href'))

The expected results is printing out the link in linkelement. The actual result is "None".

Upvotes: 0

Views: 2978

Answers (2)

Maaz
Maaz

Reputation: 2445

It is because your a tag is inside your td tag You just have to do:

linkelement = rows[linkpos]
a_element = linkelement.find('a')

print(a_element.get('href'))

Upvotes: 1

chitown88
chitown88

Reputation: 28595

Switch your .get to .find

You want to find the <a> tag, and print the href attribute

 print(linkelement.find('a')['href'])

Or you need to use .get with the tag:

print(linkelement.a.get('href'))

Upvotes: 0

Related Questions