bellaxnov
bellaxnov

Reputation: 19

How to print out a list of all article titles using beautifulsoup

I'm trying to print out a list of all the article titles in the Michigan Daily's Most Read Articles, as show on the opinion page and space out each article title with a blank line.

This is what I have written right now, but the class= "field-content" is not narrow enough to grab just the titles in the Most Read box.

import requests
from bs4 import BeautifulSoup

base_url = 'http://www.michigandaily.com/section/opinion' 
r = requests.get(base_url) 
soup = BeautifulSoup(r.text, "html5lib") 
for story_heading in soup.find_all(class_="field-content"):  
    if story_heading.a:  
        print(story_heading.a.text.replace("\n", " ").strip()) 
    # else:  
    #     print(story_heading.contents[0].strip())   

Any and all help is greatly appreciated and thank you in advance :)

Upvotes: 1

Views: 1548

Answers (1)

ballade4op52
ballade4op52

Reputation: 2267

There are three sections for articles. Each is a div with the class "view-content" containing span's (with the class "field-content") embedding an article link for that section. The third "view-content" div contains the "Most Read" articles. The following should retrieve only those articles by scanning for "field-content" in the third ("Most Read") div:

mostReadSection = soup.findAll('div', {'class':"view-content"})[2] # get the most read section

storyHeadings = mostReadSection.findAll('span', {'class':"field-content"})

for story_heading in storyHeadings:
    if story_heading.a:
        print story_heading.a.text.replace("\n", " ").strip()

Upvotes: 1

Related Questions