Reputation: 1955
Using this bit of html:
<td align="left">
<a class="playerLink" href="http://bbroto.baseball.cbssports.com/players/playerpage/2000032">
Russell, Addison
</a>
SS OAK - Won at $0
<br>
<a class="playerLink" href="http://bbroto.baseball.cbssports.com/players/playerpage/556425">
Vargas, Jason
</a>
SP LAA
<span title="Angels interested in bringing back Jason Vargas">
<a class="playerLink" href="http://bbroto.baseball.cbssports.com/players/playerpage/556425" subtab="Update">
<img border="0" height="10" src="http://sports.cbsimg.net/images/news-note-recent.gif" width="10"/>
</a>
</span>
- Dropped
</br>
</td>
I want to only show the blocks if they do not have subtab = "Update". But I haven't been able to figure out how to refer to the subtab in a Python loop using BeautifulSoup. Here's what I attempted:
soup = BeautifulSoup(html)
pl = soup.findAll('a',{'class': 'playerLink'})
for a in pl:
if a.subtab == "Update":
print "UPDATE"
else:
print "Player Name: " + a.text
I also tried referring to the subtype in the findAll part:
pl = soup.findAll('a',{'class': 'playerLink'}, {'subtype':0})
Neither of these ways works. My problem is, the class is 'playerLink' in all cases, so that subtype is the only way I can distinguish it. I'm very new to BS so I'm not too good at handling tags and attributes. In the second example, maybe it would work if I only wanted subtype=Update, but I want every a tag where the subtype does not exist.
Upvotes: 2
Views: 2244
Reputation: 1
You can try this :
containers = page_soup.findAll("a", {"class":"playerLink"})
for container in containers:
url = ("<a href='%s'>%s</a>" %(container.get("href"), container.a))
Upvotes: 0
Reputation: 414235
a.attrs
returns <a>
's attributes as a dictionary. You could check whether <a>
tag has no subtab
attribute using 'subtab' not in a.attrs
:
from bs4 import BeautifulSoup, SoupStrainer # pip install beautifulsoup4
player_links = SoupStrainer('a', 'playerLink')
soup = BeautifulSoup(html, parse_only=player_links)
names = [a.get_text().strip()
for a in soup.find_all(player_links) if 'subtab' not in a.attrs]
print(names)
# -> ['Russell, Addison', 'Vargas, Jason']
I can't find where it is mentioned in the documentation but it seems that specifying subtab=False
also works to exclude any tag that has subtab
attribute:
from bs4 import BeautifulSoup, SoupStrainer # pip install beautifulsoup4
player_links = SoupStrainer('a', 'playerLink', subtab=False)
soup = BeautifulSoup(html, parse_only=player_links)
names = [a.get_text().strip()
for a in soup.find_all(player_links)]
print(names)
If found tags (player_links
) are not nested then you could omit .find_all(player_links)
call:
from bs4 import BeautifulSoup, SoupStrainer # pip install beautifulsoup4
player_links = SoupStrainer('a', 'playerLink', subtab=False)
soup = BeautifulSoup(html, parse_only=player_links)
names = [a.get_text().strip() for a in soup]
print(names)
Upvotes: 2
Reputation: 36262
You can use getattr()
function to check if an element has an attribute:
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for a in soup.find_all('a', attrs={'class': 'playerLink'}):
#if getattr(a, 'subtab'): continue
if a.get('subtab'): continue
print(a.get_text("", strip=True))
Run it like:
python3 script.py htmlfile
It yields:
Russell, Addison
Vargas, Jason
Upvotes: 2
Reputation: 1955
Messing around with the attrs function I found out this works:
if str(a.attrs).find('subtab') > 0
It probably isn't the cleanest way to do it, but it works.
Upvotes: 0
Reputation: 4894
A simple but not particularly elegant solution is simply to search for the string 'subtab' in each element:
for a in pl:
if 'subtab' in a.prettify():
print "UPDATE"
else:
print "Player Name: " + a.text
Upvotes: 1