Reputation: 19
I'm trying to print out a list of all the article titles in the Michigan Daily's Most Read Articles, as show on the opinion page and space out each article title with a blank line.
This is what I have written right now, but the class= "field-content"
is not narrow enough to grab just the titles in the Most Read box.
import requests
from bs4 import BeautifulSoup
base_url = 'http://www.michigandaily.com/section/opinion'
r = requests.get(base_url)
soup = BeautifulSoup(r.text, "html5lib")
for story_heading in soup.find_all(class_="field-content"):
if story_heading.a:
print(story_heading.a.text.replace("\n", " ").strip())
# else:
# print(story_heading.contents[0].strip())
Any and all help is greatly appreciated and thank you in advance :)
Upvotes: 1
Views: 1548
Reputation: 2267
There are three sections for articles. Each is a div
with the class "view-content" containing span
's (with the class "field-content") embedding an article link for that section. The third "view-content" div
contains the "Most Read" articles. The following should retrieve only those articles by scanning for "field-content" in the third ("Most Read") div
:
mostReadSection = soup.findAll('div', {'class':"view-content"})[2] # get the most read section
storyHeadings = mostReadSection.findAll('span', {'class':"field-content"})
for story_heading in storyHeadings:
if story_heading.a:
print story_heading.a.text.replace("\n", " ").strip()
Upvotes: 1