Reputation: 249
I need to get the information within the "< b >" tags for each website.
response = requests.get(href)
soup = BeautifulSoup(response.content, "lxml") # or BeautifulSoup(response.content, "html5lib")
tempWeekend = []
print soup.findAll('b')
The soup.findAll('b') line prints all the b tags in the site, how can I limit it to just the dates that I want?
The website is http://www.boxofficemojo.com/movies/?page=weekend&id=catchingfire.htm, under the weekend tab.
Upvotes: 0
Views: 173
Reputation: 38
i would try something like
all_a = site.find_all('a')
for a in all_a:
if '?yr=?' in a['href']:
dates.append(a.get_text())
Upvotes: 0
Reputation: 3898
Why not search for all the b
tags, and choose the ones which contain a month?
import requests
from bs4 import BeautifulSoup
s = requests.get('http://www.boxofficemojo.com/movies/?page=weekend&id=catchingfire.htm').content
soup = BeautifulSoup(s, "lxml") # or BeautifulSoup(response.content, "html5lib")
dates = []
for i in soup.find_all('b'):
if i.text.split()[0].upper() in "JAN FEB MAR APR JUN JUL AUG SEP OCT NOV DEC":
dates.append(i.text)
print dates
(Note: I did not check the exact abbreviations that the website uses. Please check these first and accordingly modify the code)
Upvotes: 1
Reputation: 2548
It is often easiest to search using CSS selectors, e.g.
soup.select('table.chart-wide > tr > td > nobr > font > a > b')
Upvotes: 2
Reputation: 9038
Looking at that page it doesn't have any div
s or class
or id
tags which makes it tough. The only pattern I could see what that the <b>
tag directly before the dates was <b>Date:</b>
. I would iterate over the <b>
tags and then collect the tags after I hit the one with Date in it.
Upvotes: 0
Reputation: 1406
Sadly, if the tags are not further identified, there is no way to select specific ones. How should BeautifulSoup be able to distinguish between them. If you know what to roughly expect in the tags you need you could iterate over all of them and check if they match:
for b in soup.findAll('b):
if b.innerHTML == whatever:
return b
or something like that...
Or you could get the surrounding tags, i.e. 'a'
in your example and check if that matches and then get the next occurence of 'b'
.
Upvotes: 1