Reputation: 121
I am using whitehouse.gov to practice scraping web data. I have
for a_tag in soup.select('span a'):
categories.append(a_tag)
which gives me the a tags like below...
<a href="https://www.whitehouse.gov/briefing-room/./statements-releases/" rel="category tag">Statements and Releases</a>
Now I want to access just the "Statements and Releases" so I thought I would just do
for a_tag in soup.select('span a'):
categories.append(a_tag.attrs['rel')]
but this gives me ['category', 'tag'] as the output. I was playing around a little and figured out
for a_tag in soup.select('span a'):
for x in a_tag:
categories.append(x)
Gives me the output im looking for (Statements and Releases etc.), but im not sure why?
Upvotes: 0
Views: 44
Reputation: 3537
for getting text inside a href, you should use text variable:
for a_tag in soup.select('span a'):
categories.append(a_tag.text)
or
for a_tag in soup.select('span a'):
categories.append(a_tag.string)
Upvotes: 1
Reputation: 20052
I suspect that this is what you need:
for a_tag in soup.select('span a'):
categories.append(a.getText())
If you want Statements and Releases
.
Whereas, doing this:
for a_tag in soup.select('span a'):
categories.append(a_tag.attrs['rel'])
Produces the value of the rel
attribute which is category tag
Upvotes: 1