user7454726
user7454726

Reputation:

Get the title of a link using BeautifulSoup

As the title says, I am trying to get the title of a link, that is located inside of a cell. This is the website I am getting my stuff from. I have also seen this question, which is where I got my last couple lines of code from, but it didn't quite finish it for me

I am trying to get the title of the link inside the first column (or first cell of each row). I can get all of the HTML code in the cell, but I am having troubles nailing down obtaining just the title. This is what I've come up with so far

URL = 'http://theescapists.gamepedia.com/Crafting'
get_page = requests.get(URL)
plain_text = get_page.text
soup = BeautifulSoup(plain_text, 'html.parser')


for table_tag in soup.find_all('table'):
    for each_row in table_tag.find_all('tr'):
        links = each_row.find('a', href=True)
        title = links.get('title')
        print(title)
        print('')

If I print just the links section, all of the code within each cell gets printed.

I am getting an error that says AttributeError: 'NoneType' object has no attribute 'get' when I print the title part, which confuses me because I've done print(type(links)) and I get abs4.element.Tagback, which makes me think I should be able to look through for atitle` tag.

As a recap (this seemed a little long), I want to get the title tag from the first cell of each link in each table

Upvotes: 3

Views: 4071

Answers (2)

宏杰李
宏杰李

Reputation: 12158

tr tag can contains th tag which do not has a tag, you should check the a tag before you access it:

In [100]: for table_tag in soup.find_all('table'):
     ...:     for each_row in table_tag.find_all('tr'):
     ...:         links = each_row.find('a', href=True)
     ...:         if links: # check before you access
     ...:             title = links.get('title')
     ...:             print(title)
     ...:             print('')

Upvotes: 3

Icyblade
Icyblade

Reputation: 222

I think links.attrs['title'] is what you want.

My code:

for table_tag in soup.find_all('table'):
    for each_row in table_tag.find_all('tr'):
        links = each_row.find('a', href=True)
        try:
            title = links.attrs['title']
            print(title)
            print('')
        except AttributeError:
            pass

Note: The AttributeError is going to handle the header of the table, which doesn't have a title.

Upvotes: 0

Related Questions