Reputation: 121

Python Beautifulsoup accessing attribute in tag

I am using whitehouse.gov to practice scraping web data. I have

    for a_tag in soup.select('span a'):
        categories.append(a_tag)

which gives me the a tags like below...

<a href="https://www.whitehouse.gov/briefing-room/./statements-releases/" rel="category tag">Statements and Releases</a>

Now I want to access just the "Statements and Releases" so I thought I would just do

   for a_tag in soup.select('span a'):
        categories.append(a_tag.attrs['rel')]

but this gives me ['category', 'tag'] as the output. I was playing around a little and figured out

    for a_tag in soup.select('span a'):
        for x in a_tag: 
            categories.append(x)

Gives me the output im looking for (Statements and Releases etc.), but im not sure why?

Upvotes: 0

Answers (2)

Reputation: 3537

for getting text inside a href, you should use text variable:

for a_tag in soup.select('span a'):
        categories.append(a_tag.text)

for a_tag in soup.select('span a'):
        categories.append(a_tag.string)

Upvotes: 1

Reputation: 20052

I suspect that this is what you need:

   for a_tag in soup.select('span a'):
        categories.append(a.getText())

If you want Statements and Releases.

Whereas, doing this:

for a_tag in soup.select('span a'):
        categories.append(a_tag.attrs['rel'])

Produces the value of the rel attribute which is category tag

Upvotes: 1